Is a while loop with a nested for loop O(n) or O(n^2)? - algorithm

I have 2 blocks of code. One with a single while loop, and the second with a for loop inside the while loop. My professor is telling me that Option 1 has an algorithm complexity of O(n) and Option 2 has an algorithm complexity of O(n^2), however can't explain why that is, other than pointing to the nested for loops. I am confused because both perform the exact same number of calculations for any given size N, which doesn't seem to be indicative that they have different algorithm complexities.
I'd like to know:
a) if my professor is correct, and how they can boast the same calculations but have different big Os.
b) if my professor is incorrect and they are the same complexity, is it O(n) or O(n^2)? Why?
I've used inline comments denoted by '#' to note the computations. Packages to deliver should be N. Self.trucks is a list. self.isWorkDayComplete is a boolean determined by whether all packages have been delivered.
Option 1:
# initializes index for fake for loop
truck_index = 0
while(not self.workDayCompleted):
# checks if truck index has reached end of self.trucks list
if(truck_index != len(self.trucks)):
# does X amount of calculations required for delivery of truck's packages
while(not self.trucks[truck_index].isEmpty()):
trucks[truck_index].travel()
trucks[truck_index].deliverPackage()
if(hub.packagesExist()):
truck[truck_index].travelToHub()
truck[truck_index].loadPackages()
# increments index
truck_index += 1
else:
# resets index to 0 for next iteration set through truck list
truck_index = 0
# does X amount of calculations required for while loop condition
self.workDayCompleted = isWorkDayCompleted()
Option 2:
while(not self.workDayCompleted):
# initializes index (i)
# each iteration checks if truck index has reached end of self.trucks list
# increments index
for i in range(len(trucks)):
# does X amount of calculations required for Delivery of truck's packages
while(not self.trucks[i].isEmpty()):
trucks[i].travel()
trucks[i].deliverPackage()
if(hub.packagesExist()):
truck[i].travelToHub()
truck[i].loadPackages()
# does X amount of calculations required for while loop condition
self.workDayCompleted = isWorkDayCompleted()
Any help is greatly appreciated, thank you!

It certainly seems like these two pieces of code are effectively implementing the same algorithm (i.e. deliver a package with each truck, then check to see if the work day is completed, repeat until the work day is completed). From this perspective you're right to be skeptical.
The question becomes: are they O(n) or O(n2)? As you've described it, this is impossible to determine because we don't know what the conditions are for the work day being completed. Is it related to the amount of work that has been done by the trucks? Without that information we have no ability to reason about when the outer loop exits. For all we know the condition is that each truck must deliver 2n packages and the complexity is actually O(n 2n).
So if your professor is right, my only guess is that there's a difference between the implementations of isWorkDayCompleted() between the two options. Barring something like that, though, the two options should have the same complexity.
Regardless, when it comes to problems like this it is always important to make sure that you're both talking about the same things:
What n means (presumably the number of trucks)
What you're counting (presumably the number of deliveries and maybe also the checks for the work day being done)
What the end state is (this is the red flag for me -- the work day being completed needs better defined)
Subsequent edits lead me to believe both of these options are O(n), since they ultimately perform one or two "travel" operations per package, depending on the number of trucks and their capacity. Given this, I think the answer to your core question (do those different control structures result in different complexity analysis) is no, they don't.
It also seems unlikely that the internals are affecting the code complexity in some important way, so my advice would be to get back together with your professor and see if they can expand on their thoughts. It very well might be that this was an oversight on their part or that they were trying to make a more subtle point about how some of the component you're using were implemented.
If you get their explanation and there is something more complex going on that you still have trouble understanding, that should probably be a separate question (perhaps linked to this one).

a) if my professor is correct, and how they can boast the same calculations but have different big Os.
Two algorithms that do the same number of "basic operations" have the same time complexity, regardless how the code is structured.
b) if my professor is incorrect and they are the same complexity, is it O(n) or O(n^2)? Why?
First you have to define: what is "n"? Is n the number of trucks? Next, does the number of "basic operations" per truck the same or does it vary in some way?
For example: If the number of operations per truck is constant C, the total number of operations is C*n. That's in the complexity class O(n).

Related

Data Structures & Algorithms Optimal Solution Explanation

im currently doing a ds&a udemy course as i am prepping for the heavy recruiting this upcoming fall. i stumbled upon a problem that prompted along the lines of:
"Given to list arrays, figure out what integer is missing from the second list array that was present in the first list array "
There were two solutions given in the course one which was considered a brute force solution and the other one the more optimal.
Here are the solutions:
def finderBasic(list1,list2):
list1.sort()
list2.sort()
for i in range(len(list1)):
if list1[i] != list2[i]:
return list1[i]
def finderOptimal(list1,list2):
d = collections.defaultdict(int)
for num in list2:
d[num] = 1
for num in list1:
if d[num] == 0:
return num
else:
d[num] -= 1
The course explains that the finderOptimal is a more optimal way of solving the problem as it solves it in O(n) or linearly. Can someone please further explain to me why that is. I just felt like the finderBasic was much more simpler and only went through one loop. Any help would be much appreciated thank you!
You would be correct, if it was only about going through loop, the first solution would be better.
-- as you said, going through one for loop (whole) takes O(n) time, and it doesn't matter if you go through it once, twice or c-times (as long as c is small enough).
However the heavy operation here is sorting, as it takes cca n*log(n) time, which is larger than O(n). That means, even if you run through the for loop twice in the 2nd solution, it will be still much better than sorting once.
Please note, that accessing dictionary key takes approximately O(1) time, so the time is still O(n) time with the loop.
Refer to: https://wiki.python.org/moin/TimeComplexity
The basic solution may be better for a reader, as it's very simple and straight forward, however it's more complex.
Disclaimer: I am not familiar with python.
There are two loops you are not accounting for in the first example. Each of those sort() calls would have at least two nested loops to implement the sorting. On top of that, usually the best performance you can get in the general case is O(n log(n)) when doing sorting.
The second case avoids all sorting and simply uses a "playcard" to mark what is present. Additionally, it uses dictionary which is a hash table. I am sure you have already learned that hash tables offer constant time - O(1) - operations.
Simpler does not always mean most efficient. Conversely, efficient is often hard to comprehend.

Most effective Algorithm to find maximum of double-precision values

What is the most effective way of finding a maximum value in a set of variables?
I have seen solutions, such as
private double findMax(double... vals) {
double max = Double.NEGATIVE_INFINITY;
for (double d : vals) {
if (d > max) max = d;
}
return max;
}
But, what would be the most effective algorithm for doing this?
You can't reduce the complexity below O(n) if the list is unsorted... but you can improve the constant factor by a lot. Use SIMD. For example, in SSE you would use the MAXSS instruction to perform 4-ish compare+select operations in a single cycle. Unroll the loop a bit to reduce the cost of loop control logic. And then outside the loop, find the max out of the four values trapped in your SSE register.
This gives a benefit for any size list... also using multithreading makes sense for really large lists.
Assuming the list does not have elements in any particular order, the algorithm you mentioned in your question is optimal. It must look at every element once, thus it takes time directly proportional to the to the size of the list, O(n).
There is no algorithm for finding the maximum that has a lower upper bound than O(n).
Proof: Suppose for a contradiction that there is an algorithm that finds the maximum of a list in less than O(n) time. Then there must be at least one element that it does not examine. If the algorithm selects this element as the maximum, an adversary may choose a value for the element such that it is smaller than one of the examined elements. If the algorithm selects any other element as the maximum, an adversary may choose a value for the element such that it is larger than the other elements. In either case, the algorithm will fail to find the maximum.
EDIT: This was my attempt answer, but please look at the coments where #BenVoigt proposes a better way to optimize the expression
You need to traverse the whole list at least once
so it'd be a matter of finding a more efficient expression for if (d>max) max=d, if any.
Assuming we need the general case where the list is unsorted (if we keep it sorted we'd just pick the last item as #IgnacioVazquez points in the comments), and researching a little about branch prediction (Why is it faster to process a sorted array than an unsorted array? , see 4th answer) , looks like
if (d>max) max=d;
can be more efficiently rewritten as
max=d>max?d:max;
The reason is, the first statement is normally translated into a branch (though it's totally compiler and language dependent, but at least in C and C++, and even in a VM-based language like Java happens) while the second one is translated into a conditional move.
Modern processors have a big penalty in branches if the prediction goes wrong (the execution pipelines have to be reset), while a conditional move is an atomic operation that doesn't affect the pipelines.
The random nature of the elements in the list (one can be greater or lesser than the current maximum with equal probability) will cause many branch predictions to go wrong.
Please refer to the linked question for a nice discussion of all this, together with benchmarks.

Sorting algorithm for expensive comparison

Given is an array of n distinct objects (not integers), where n is between 5 and 15. I have a comparison function cmp(a, b) which is true if a < b and false otherwise, but it's very expensive to call. I'm looking for a sorting algorithm with the following properties:
It calls cmp(a, b) as few times as possible (subject to constraints below). Calls to cmp(a, b) can't be parallelized or replaced. The cost is unavoidable, i.e. think of each call to cmp(a, b) as costing money.
Aborting the algorithm should give good-enough results (best-fit sort of the array). Ideally the algorithm should attempt to produce a coarse order of the whole array, as opposed partially sorting one subset at a time. This may imply that the overall number of calls is not as small as theoretically possible to sort the entire array.
cmp(a, b) implies not cmp(b, a) => No items in the array are equal => Stability is not required. This is always true, unless...
In rare cases cmp(a, b) violates transitivity. For now I'll ignore this, but ultimately I would like this to be handled as well. Transitivity could be violated in short chains, i.e. x < y < z < x, but not in longer chains. In this case the final order of x y z doesn't matter.
Only the number of calls to cmp() needs to be optimized; algorithm complexity, space, speed and other factors are irrelevant.
Back story
Someone asked where this odd problem arose. Well, despite at my shallow attempt at formalism, the problem is actually not formal at all. A while back a friend of mine found a web page on the internets, that allowed him to put some stuff in a list, and make comparisons on that list in order to get it sorted. He since lost that web page, and asked me to help him out. Sure, I said and smashed my keyboard arriving at this implemtation. You are welcome to peruse the source code to see how i pretended to solve the problem above. Since I was quite inebriated when all this happened, I decided to outsource the real thinking to stack overflow.
Your best bet to start with would be Chp 5 of Knuth's TAOCP Vol III..it is about optimal sorting (ie with minimal number of comparisons). OTOH, since the number of objects you are sorting is very small I doubt there will be any noticeable difference between an optimal algorithm vs, say, bubble sort. So perhaps you will need to focus on making the comparisons cheaper. Strange problem though...would you mind giving details? Where does it arise?

Improving Box Factory solution

Box Factory is a problem in Google Code Jam 2012 Round 1C. It is similar to the Longest Common Subsequence problem, and they have given an O(n^4) solution for it. However, at the end of the analysis it says that another improvement can reduce this again to O(n^3). I am wondering what optimization can be done to the solution.
O(n^4) Algorithm
The dynamic programming approach solves for f[x][y] = the maximum number of toys that could be placed in boxes using the first x runs of boxes and the first y runs of toys.
It solves this by considering the boxes of the last type for runs between a+1 and x, and toys of the last type for runs between b+1 and y.
The O(n^4) algorithm loops over all choices for a and b, but we can simplify by only considering critical values of a and b.
O(n^3) Algorithm
The key point is that if we have a,b such that we have more boxes than toys, then there is no point changing a to get even more boxes (as this will never help us make any more products). Similarly, if we have more toys than boxes, then we can skip considering all the cases of b which would gives us even more toys.
This suggests a O(n) algorithm for the inner loop in which we trace out the boundary of a,b between having more toys and having more boxes. This is quite simple as we can just start with a=x-1, and b=y-1 and then decrease either a or b according to whether we currently have more toys or boxes. (If equal then you can decrease both.)
Each step of the algorithm decreases either a or b by 1, so this iteration will require x+y steps instead of the x*y steps of the original method.
It needs to be repeated for all values of x,y so overall the complexity is O(n^3).
Additional Improvements
A further improvement would be to store the index of the previous run of each type as this would allow several steps of the algorithm to be collapsed into a single move (because we know that our score can only improve once we work back to a run of the correct type). However, this would still be O(n^3) in the worst case (all boxes/toys of the same type).
Another practical improvement is to coalesce any runs in which the type was the same at consecutive positions, as this may significantly simplify test cases designed to expose the worst case behaviour in the previous improvement.

What are the ways of calculating Average Case's

Background:
For my Data Structures and Algorithms I am studying the Big O Notation. So far I understand how to workout the time complexity, best and worst case scenario. However, the average case is just baffling my head. The teacher is just throwing at us equations that I don't understand. And he is not willing to explain them in detail.
Question:
So please guys, what is the best way to calculate this? Is there one equation that calculates this or does it vary from algorithm to algorithm?
What are the steps you take to calculate this?
Let's take an example of Insertion sort algorithm?
Research:
I looked on youtube and stackoverflow for answers. But they all use different equations.
Any help would be great
thanks
As mentioned in the comment you have to look at the average input to the algorithm (which in this case means random). A good way to think about it is to try at trace what the algorithm would do if the input was average.
For the example of insertion sort:
In the best case (when the input is already sorted) the algorithm will look through the input but never exchanging anything, clearly resulting in a running time of O(n).
In the worst case (when the input is exactly opposite if the desired order) the algorithm will move every input all the way from it's current position to the start of the list, that is, the object on index 0 will not be moved, the object on index 1 will be moved once, the object on input 2 will be moved twice and so on, resulting in a running time of 0+1+2+3+...+n-1 ≈ 0.5n² = O(n²).
The same way of thinking can be used to find the average case, but instead of each object moving all the way to the start, we can expect that it will on average move halfway down to the start, that is, the object on index 0 will not be moved, the object on index 1 will be moved a half time (of cause this only makes sense on average), the object on input 2 will be moved once, the object on index 3 will be moved 1,5 times and so on, resulting in a running time of 0 + 0.5 + 1 + 1.5 + 2 + ... + (n-1)/2 ≈ 0.25n² (at each index, we have half of what we had in the worst case) = 0(n²).
Of cause not all algorithms are as simple as this, but looking at what the algorithm would do on each step if the input was random usually helps. If you have any kind of information available on the input to the algorithm, (for instance insertion sort is often used as the last step after an other algorithm has done most of the sorting, as it is very efficient if the input is almost sorted, and in such a case we might for example know that no object is going to be moved more than x times) then this can be taken into account when computing the average running time.

Resources