Background
I am trying to work through some problems I found from Stanfords "Design & Analysis of Algorithms" course from 2013. In particular, problem 3 from problem set 1 here.
In summary it states:
You are stuck on a desert island with a radio that can transmit a distress signal at integer power levels (i.e. 1W, 2W, 3W, etc.).
If you transmit a signal with high enough power, you will receive a response and be rescued.
Unfortunately you do not know how much power, n, is needed.
The problem requests you design an algorithm that uses Θ(n)W total power.
Being a 5pt question from problem set 1, I presume this is easier than I am finding it.
My question is
...what is this algorithm?....or how can I change my thinking to find one?
Where I'm stuck
The question states that the strategy "just increase power by 1 watt each time" will result in an Θ(n^2)W total power. Indeed, this is true, as the total power used by any n is n * (n+1) / 2.
However, I can't think of any strategy that isn't:
greater than linear; or
a strategy where I cheat by "not doing anything for a few consecutive n's".
Also, if I ignore the discreteness of the radio for a minute and analyse the problem as a continuous linear function, the total power should be generalisable to a function g(n) of the form g(n) = Kn + B (where K and B are constants). This linear function would represent the integral of the function we need to use for controlling the radio.
Then, if I take the derivative of this function, dg(n)/dn, I'm left with K. I.e. If I want to have linear total power, I should just drive the radio at a constant power for n times...but this would only result in a rescue if I happened to guess K correctly the first time.
EDIT
Yes, I had already thought of doubling etc....but the answers here pointed out the error in my thinking. I had been trying to solve the question "design an algorithm that has linear cumulative power consumption"...which I think is impossible. As pointed out by the answers, I should have thought about it as "for a given n, design an algorithm that will consume Kn"...i.e. what the question posed.
I've read the assignment...
It states that the radio is capable of transmitting in integers, but it doesn't mean you should try it one by one and go over all the integers until n.
Well, I could give you the answer but I'll try just to lead you to think about it on your own:
Please notice that you need to transmit a signal equal or greater than n, so there is no way you are "going to far". Now, with the concepts of complexity, if you go over all the signals you'll get a series of (1+2+3+...+n) which equals to Θ(n^2), try to think of a pattern which you can skip some of those and getting a series that results a sum of Θ(n).
This task is similar to searching algorithms which naively goes for Θ(n^2), but there are algorithms reduced to less than that - you should go and explore how they work :)
If you want an approach for an answer:
You can start with 1W and each step double it for the next transmit. That way you will do log(n) attempts, and each attempt costs i which i is the power of that attempt. So the cumulative power will be something like that: (1+2+4+8+16+...+n) which equals to 2n-1 and fits the requirement of Θ(n)
Well here is a simple algorithm and complexity analysis:
Initialy try with power=1W
If it wasn't received try with power=2*previous_power until it is received
Complexity:
So basically we stop when power p>= n, where n is the desired threshold.
We know that:
p>=n and p/2<n => n<=p<2n
In order to reach the pW (i.e the desired level in order to be received) this means you tried previously with p/2, prior to that p/4... and initially with 1, so let's sum up all the steps:
1+2+4+...+p/2+p -> 2*p ~ Θ(p) = Θ(n)
Related
I have 2 blocks of code. One with a single while loop, and the second with a for loop inside the while loop. My professor is telling me that Option 1 has an algorithm complexity of O(n) and Option 2 has an algorithm complexity of O(n^2), however can't explain why that is, other than pointing to the nested for loops. I am confused because both perform the exact same number of calculations for any given size N, which doesn't seem to be indicative that they have different algorithm complexities.
I'd like to know:
a) if my professor is correct, and how they can boast the same calculations but have different big Os.
b) if my professor is incorrect and they are the same complexity, is it O(n) or O(n^2)? Why?
I've used inline comments denoted by '#' to note the computations. Packages to deliver should be N. Self.trucks is a list. self.isWorkDayComplete is a boolean determined by whether all packages have been delivered.
Option 1:
# initializes index for fake for loop
truck_index = 0
while(not self.workDayCompleted):
# checks if truck index has reached end of self.trucks list
if(truck_index != len(self.trucks)):
# does X amount of calculations required for delivery of truck's packages
while(not self.trucks[truck_index].isEmpty()):
trucks[truck_index].travel()
trucks[truck_index].deliverPackage()
if(hub.packagesExist()):
truck[truck_index].travelToHub()
truck[truck_index].loadPackages()
# increments index
truck_index += 1
else:
# resets index to 0 for next iteration set through truck list
truck_index = 0
# does X amount of calculations required for while loop condition
self.workDayCompleted = isWorkDayCompleted()
Option 2:
while(not self.workDayCompleted):
# initializes index (i)
# each iteration checks if truck index has reached end of self.trucks list
# increments index
for i in range(len(trucks)):
# does X amount of calculations required for Delivery of truck's packages
while(not self.trucks[i].isEmpty()):
trucks[i].travel()
trucks[i].deliverPackage()
if(hub.packagesExist()):
truck[i].travelToHub()
truck[i].loadPackages()
# does X amount of calculations required for while loop condition
self.workDayCompleted = isWorkDayCompleted()
Any help is greatly appreciated, thank you!
It certainly seems like these two pieces of code are effectively implementing the same algorithm (i.e. deliver a package with each truck, then check to see if the work day is completed, repeat until the work day is completed). From this perspective you're right to be skeptical.
The question becomes: are they O(n) or O(n2)? As you've described it, this is impossible to determine because we don't know what the conditions are for the work day being completed. Is it related to the amount of work that has been done by the trucks? Without that information we have no ability to reason about when the outer loop exits. For all we know the condition is that each truck must deliver 2n packages and the complexity is actually O(n 2n).
So if your professor is right, my only guess is that there's a difference between the implementations of isWorkDayCompleted() between the two options. Barring something like that, though, the two options should have the same complexity.
Regardless, when it comes to problems like this it is always important to make sure that you're both talking about the same things:
What n means (presumably the number of trucks)
What you're counting (presumably the number of deliveries and maybe also the checks for the work day being done)
What the end state is (this is the red flag for me -- the work day being completed needs better defined)
Subsequent edits lead me to believe both of these options are O(n), since they ultimately perform one or two "travel" operations per package, depending on the number of trucks and their capacity. Given this, I think the answer to your core question (do those different control structures result in different complexity analysis) is no, they don't.
It also seems unlikely that the internals are affecting the code complexity in some important way, so my advice would be to get back together with your professor and see if they can expand on their thoughts. It very well might be that this was an oversight on their part or that they were trying to make a more subtle point about how some of the component you're using were implemented.
If you get their explanation and there is something more complex going on that you still have trouble understanding, that should probably be a separate question (perhaps linked to this one).
a) if my professor is correct, and how they can boast the same calculations but have different big Os.
Two algorithms that do the same number of "basic operations" have the same time complexity, regardless how the code is structured.
b) if my professor is incorrect and they are the same complexity, is it O(n) or O(n^2)? Why?
First you have to define: what is "n"? Is n the number of trucks? Next, does the number of "basic operations" per truck the same or does it vary in some way?
For example: If the number of operations per truck is constant C, the total number of operations is C*n. That's in the complexity class O(n).
In the question "What's the numerically best way to calculate the average" it was suggested, that calculating a rolling mean, i.e.
mean = a[n]/n + (n-1)/n * mean
might be numerically more stable than calculating the sum and then dividing by the total number of elements. This was questioned by a commenter. I can not tell which one is true - can someone else? The advantage of the rolling mean is, that you keep the mean small (i.e. at roughly the same size of all vector entries). Intuitively this should keep the error small. But the commenter claims:
Part of the issue is that 1/n introduces errors in the least significant bits, so n/n != 1, at least when it is performed as a three step operation (divide-store-multiply). This is minimized if the division is only performed once, but you'd be doing it over GB of data.
So I have multiple questions:
Is the rolling mean more precise than summing and then dividing?
Does that depend on the question whether 1/n is calculated first and then multiplied?
If so, do computers implement a one step division? (I thought so, but I am unsure now)
If yes, is it more precise than Kahan summation and then dividing?
If compareable - which one is faster? In both cases we have additional calculations.
If more precise, could you use this for precise summation?
In many circumstances, yes. Consider a sequence of all positive terms, all on the same order of magnitude. Adding them all generates a large intermediate sum, to which we add small terms, which might round precisely to the intermediate sum. Using the rolling sum, you get terms on the same order of magnitude, and in addition, the sum is much harder to overflow. However, this is not open and shut: Adding the terms and then dividing allows us to use AVX instructions, which are significantly faster than the subtract/divide/add instructions of the rolling loop. In addition, there are distributions which cause one or the other to be more accurate. This has been examined in:
Robert F Ling. Comparison of several algorithms for computing sample means and variances. Journal of the American Statistical Association, 69(348): 859–866, 1974
Kahan summation is an orthogonal issue. You can apply Kahan summation to the sequence x[n] = (x[n-1]-mu)/n; this is very accurate.
I've been trying to figure out how to efficiently calculate the covariance in a moving window, i.e. moving from a set of values (x[0], y[0])..(x[n-1], y[n-1]) to a new set of values (x[1], y[1])..(x[n], y[n]). In other words, the value (x[0], y[0]) gets replaces by the value (x[n], y[n]). For performance reasons I need to calculate the covariance incrementally in the sense that I'd like to express the new covariance Cov(x[1]..x[n], y[1]..y[n]) in terms of the previous covariance Cov(x[0]..x[n-1], y[0]..y[n-1]).
Starting off with the naive formula for covariance as described here:
[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance][1]
All I can come up with is:
Cov(x[1]..x[n], y[1]..y[n]) =
Cov(x[0]..x[n-1], y[0]..y[n-1]) +
(x[n]*y[n] - x[0]*y[0]) / n -
AVG(x[1]..x[n]) * AVG(y[1]..y[n]) +
AVG(x[0]..x[n-1]) * AVG(y[0]..y[n-1])
I'm sorry about the notation, I hope it's more or less clear what I'm trying to express.
However, I'm not sure if this is sufficiently numerically stable. Dealing with large values I might run into arithmetic overflows or other (for example cancellation) issues.
Is there a better way to do this?
Thanks for any help.
It looks like you are trying some form of "add the new value and subtract the old one". You are correct to worry: this method is not numerically stable. Keeping sums this way is subject to drift, but the real killer is the fact that at each step you are subtracting a large number from another large number to get what is likely a very small number.
One improvement would be to maintain your sums (of x_i, y_i, and x_i*y_i) independently, and recompute the naive formula from them at each step. Your running sums would still drift, and the naive formula is still numerically unstable, but at least you would only have one step of numerical instability.
A stable way to solve this problem would be to implement a formula for (stably) merging statistical sets, and evaluate your overall covariance using a merge tree. Moving your window would update one of your leaves, requiring an update of each node from that leaf to the root. For a window of size n, this method would take O(log n) time per update instead of the O(1) naive computation, but the result would be stable and accurate. Also, if you don't need the statistics for each incremental step, you can update the tree once per each output sample instead of once per input sample. If you have k input samples per output sample, this reduces the cost per input sample to O(1 + (log n)/k).
From the comments: the wikipedia page you reference includes a section on Knuth's online algorithm, which is relatively stable, though still prone to drift. You should be able to do something comparable for covariance; and resetting your computation every K*n samples should limit the drift at minimal cost.
Not sure why no one has mentioned this, but you can use the Welford online algorithm which relies on the running mean:
The equations should look like:
the online mean given by:
Box Factory is a problem in Google Code Jam 2012 Round 1C. It is similar to the Longest Common Subsequence problem, and they have given an O(n^4) solution for it. However, at the end of the analysis it says that another improvement can reduce this again to O(n^3). I am wondering what optimization can be done to the solution.
O(n^4) Algorithm
The dynamic programming approach solves for f[x][y] = the maximum number of toys that could be placed in boxes using the first x runs of boxes and the first y runs of toys.
It solves this by considering the boxes of the last type for runs between a+1 and x, and toys of the last type for runs between b+1 and y.
The O(n^4) algorithm loops over all choices for a and b, but we can simplify by only considering critical values of a and b.
O(n^3) Algorithm
The key point is that if we have a,b such that we have more boxes than toys, then there is no point changing a to get even more boxes (as this will never help us make any more products). Similarly, if we have more toys than boxes, then we can skip considering all the cases of b which would gives us even more toys.
This suggests a O(n) algorithm for the inner loop in which we trace out the boundary of a,b between having more toys and having more boxes. This is quite simple as we can just start with a=x-1, and b=y-1 and then decrease either a or b according to whether we currently have more toys or boxes. (If equal then you can decrease both.)
Each step of the algorithm decreases either a or b by 1, so this iteration will require x+y steps instead of the x*y steps of the original method.
It needs to be repeated for all values of x,y so overall the complexity is O(n^3).
Additional Improvements
A further improvement would be to store the index of the previous run of each type as this would allow several steps of the algorithm to be collapsed into a single move (because we know that our score can only improve once we work back to a run of the correct type). However, this would still be O(n^3) in the worst case (all boxes/toys of the same type).
Another practical improvement is to coalesce any runs in which the type was the same at consecutive positions, as this may significantly simplify test cases designed to expose the worst case behaviour in the previous improvement.
Background:
For my Data Structures and Algorithms I am studying the Big O Notation. So far I understand how to workout the time complexity, best and worst case scenario. However, the average case is just baffling my head. The teacher is just throwing at us equations that I don't understand. And he is not willing to explain them in detail.
Question:
So please guys, what is the best way to calculate this? Is there one equation that calculates this or does it vary from algorithm to algorithm?
What are the steps you take to calculate this?
Let's take an example of Insertion sort algorithm?
Research:
I looked on youtube and stackoverflow for answers. But they all use different equations.
Any help would be great
thanks
As mentioned in the comment you have to look at the average input to the algorithm (which in this case means random). A good way to think about it is to try at trace what the algorithm would do if the input was average.
For the example of insertion sort:
In the best case (when the input is already sorted) the algorithm will look through the input but never exchanging anything, clearly resulting in a running time of O(n).
In the worst case (when the input is exactly opposite if the desired order) the algorithm will move every input all the way from it's current position to the start of the list, that is, the object on index 0 will not be moved, the object on index 1 will be moved once, the object on input 2 will be moved twice and so on, resulting in a running time of 0+1+2+3+...+n-1 ≈ 0.5n² = O(n²).
The same way of thinking can be used to find the average case, but instead of each object moving all the way to the start, we can expect that it will on average move halfway down to the start, that is, the object on index 0 will not be moved, the object on index 1 will be moved a half time (of cause this only makes sense on average), the object on input 2 will be moved once, the object on index 3 will be moved 1,5 times and so on, resulting in a running time of 0 + 0.5 + 1 + 1.5 + 2 + ... + (n-1)/2 ≈ 0.25n² (at each index, we have half of what we had in the worst case) = 0(n²).
Of cause not all algorithms are as simple as this, but looking at what the algorithm would do on each step if the input was random usually helps. If you have any kind of information available on the input to the algorithm, (for instance insertion sort is often used as the last step after an other algorithm has done most of the sorting, as it is very efficient if the input is almost sorted, and in such a case we might for example know that no object is going to be moved more than x times) then this can be taken into account when computing the average running time.