C++: Time complexity for vector push_back() - trying to understand the C++ Primer 5th Ed - Stanley Lipmann - algorithm

I am reading C++ Primer on vector push_back() and trying to understand the below statement.
Every implementation is required to follow a strategy that ensures that it is efficient
to use push_back to add elements to a vector. Technically speaking, the execution
time of creating an n-element vector by calling push_back n times on an initially
empty vector must never be more than a constant multiple of n
I have read quite a bit of data structure on this but just do not understand the one in BOLD ITALIC. Not sure what the author is trying to say especially the word "MULTIPLE".
Wondering as an example if n=5 does the author meant multiple of 5 (e.g. 5 or 10 or 15 etc)?
Appreciate any help.

The author is saying that there must exist a constant value c, such that inserting n elements into the vector does never take longer than c*n.
For example that c could be for a given element type one microsecond per item. In that case inserting 100 items into the vector will not take longer than 100 microseconds.
Technically that isn't exactly the guarantee you have. There is usually no obvious way to precisely define such a time bounds on real systems. And if the time required in the constructors of the element type is not constant, it may also impact this kind of time guarantee.
Instead the standard makes such a guarantee only about the number of times that an operation will be performed on the elements. So here for example if the vector's element type is a class type, c could be 3 per element, which would mean that inserting 100 elements will at most call constructors of the element type 300 times.
This is meant as an asymptotic bound and not to determine actual time execution takes for small values of n.
This requirement is also known as amortized constant time complexity for individual push_back.

Related

Is a while loop with a nested for loop O(n) or O(n^2)?

I have 2 blocks of code. One with a single while loop, and the second with a for loop inside the while loop. My professor is telling me that Option 1 has an algorithm complexity of O(n) and Option 2 has an algorithm complexity of O(n^2), however can't explain why that is, other than pointing to the nested for loops. I am confused because both perform the exact same number of calculations for any given size N, which doesn't seem to be indicative that they have different algorithm complexities.
I'd like to know:
a) if my professor is correct, and how they can boast the same calculations but have different big Os.
b) if my professor is incorrect and they are the same complexity, is it O(n) or O(n^2)? Why?
I've used inline comments denoted by '#' to note the computations. Packages to deliver should be N. Self.trucks is a list. self.isWorkDayComplete is a boolean determined by whether all packages have been delivered.
Option 1:
# initializes index for fake for loop
truck_index = 0
while(not self.workDayCompleted):
# checks if truck index has reached end of self.trucks list
if(truck_index != len(self.trucks)):
# does X amount of calculations required for delivery of truck's packages
while(not self.trucks[truck_index].isEmpty()):
trucks[truck_index].travel()
trucks[truck_index].deliverPackage()
if(hub.packagesExist()):
truck[truck_index].travelToHub()
truck[truck_index].loadPackages()
# increments index
truck_index += 1
else:
# resets index to 0 for next iteration set through truck list
truck_index = 0
# does X amount of calculations required for while loop condition
self.workDayCompleted = isWorkDayCompleted()
Option 2:
while(not self.workDayCompleted):
# initializes index (i)
# each iteration checks if truck index has reached end of self.trucks list
# increments index
for i in range(len(trucks)):
# does X amount of calculations required for Delivery of truck's packages
while(not self.trucks[i].isEmpty()):
trucks[i].travel()
trucks[i].deliverPackage()
if(hub.packagesExist()):
truck[i].travelToHub()
truck[i].loadPackages()
# does X amount of calculations required for while loop condition
self.workDayCompleted = isWorkDayCompleted()
Any help is greatly appreciated, thank you!
It certainly seems like these two pieces of code are effectively implementing the same algorithm (i.e. deliver a package with each truck, then check to see if the work day is completed, repeat until the work day is completed). From this perspective you're right to be skeptical.
The question becomes: are they O(n) or O(n2)? As you've described it, this is impossible to determine because we don't know what the conditions are for the work day being completed. Is it related to the amount of work that has been done by the trucks? Without that information we have no ability to reason about when the outer loop exits. For all we know the condition is that each truck must deliver 2n packages and the complexity is actually O(n 2n).
So if your professor is right, my only guess is that there's a difference between the implementations of isWorkDayCompleted() between the two options. Barring something like that, though, the two options should have the same complexity.
Regardless, when it comes to problems like this it is always important to make sure that you're both talking about the same things:
What n means (presumably the number of trucks)
What you're counting (presumably the number of deliveries and maybe also the checks for the work day being done)
What the end state is (this is the red flag for me -- the work day being completed needs better defined)
Subsequent edits lead me to believe both of these options are O(n), since they ultimately perform one or two "travel" operations per package, depending on the number of trucks and their capacity. Given this, I think the answer to your core question (do those different control structures result in different complexity analysis) is no, they don't.
It also seems unlikely that the internals are affecting the code complexity in some important way, so my advice would be to get back together with your professor and see if they can expand on their thoughts. It very well might be that this was an oversight on their part or that they were trying to make a more subtle point about how some of the component you're using were implemented.
If you get their explanation and there is something more complex going on that you still have trouble understanding, that should probably be a separate question (perhaps linked to this one).
a) if my professor is correct, and how they can boast the same calculations but have different big Os.
Two algorithms that do the same number of "basic operations" have the same time complexity, regardless how the code is structured.
b) if my professor is incorrect and they are the same complexity, is it O(n) or O(n^2)? Why?
First you have to define: what is "n"? Is n the number of trucks? Next, does the number of "basic operations" per truck the same or does it vary in some way?
For example: If the number of operations per truck is constant C, the total number of operations is C*n. That's in the complexity class O(n).

Algorithm to find all values repeating more than floor(n/k) times in O(n log k) time [duplicate]

This problem is 4-11 of Skiena. The solution to finding majority elements - repeated more than half times is majority algorithm. Can we use this to find all numbers repeated n/4 times?
Misra and Gries describe a couple approaches. I don't entirely understand their paper, but a key idea is to use a bag.
Boyer and Moore's original majority algorithm paper has a lot of incomprehensible proofs and discussion of formal verification of FORTRAN code, but it has a very good start of an explanation of how the majority algorithm works. The key concept starts with the idea that if the majority of the elements are A and you remove, one at a time, a copy of A and a copy of something else, then in the end you will have only copies of A. Next, it should be clear that removing two different items, neither of which is A, can only increase the majority that A holds. Therefore it's safe to remove any pair of items, as long as they're different. This idea can then be made concrete. Take the first item out of the list and stick it in a box. Take the next item out and stick it in the box. If they're the same, let them both sit there. If the new one is different, throw it away, along with an item from the box. Repeat until all items are either in the box or in the trash. Since the box is only allowed to have one kind of item at a time, it can be represented very efficiently as a pair (item type, count).
The generalization to find all items that may occur more than n/k times is simple, but explaining why it works is a little harder. The basic idea is that we can find and destroy groups of k distinct elements without changing anything. Why? If w > n/k then w-1 > (n-k)/k. That is, if we take away one of the popular elements, and we also take away k-1 other elements, then the popular element remains popular!
Implementation: instead of only allowing one kind of item in the box, allow k-1 of them. Whenever you see a group of k different items show up (that is, there are k-1 types in the box, and the one arriving doesn't match any of them), you throw one of each type in the trash, including the one that just arrived. What data structure should we use for this "box"? Well, a bag, of course! As Misra and Gries explain, if the elements can be ordered, a tree-based bag with O(log k) basic operations will give the whole algorithm a complexity of O(n log k). One point to note is that the operation of removing one of each element is a bit expensive (O(k) for a typical implementation), but that cost is amortized over the arrivals of those elements, so it's no big deal. Of course, if your elements are hashable rather than orderable, you can use a hash-based bag instead, which under certain common assumptions will give even better asymptotic performance (but it's not guaranteed). If your elements are drawn from a small finite set, you can guarantee that. If they can only be compared for equality, then your bag gets much more expensive and I'm pretty sure you end up with something like O(nk) instead.
Find the majority element that appears n/2 times by Moore-Voting Algorithm
See method 3 of the given link for Moore's Voting Algo (http://www.geeksforgeeks.org/majority-element/).
Time:O(n)
Now after finding majority element, scan the array again and remove the majority element or make it -1.
Time:O(n)
Now apply Moore Voting Algorithm on the remaining elements of array (but ignore -1 now as it has already been included earlier). The new majority element appears n/4 times.
Time:O(n)
Total Time:O(n)
Extra Space:O(1)
You can do it for element appearing more than n/8,n/16,.... times
EDIT:
There may exist a case when there is no majority element in the array:
For e.g. if the input arrays is {3, 1, 2, 2, 1, 2, 3, 3} then the output should be [2, 3].
Given an array of of size n and a number k, find all elements that appear more than n/k times
See this link for the answer:
https://stackoverflow.com/a/24642388/3714537
References:
http://www.cs.utexas.edu/~moore/best-ideas/mjrty/
See this paper for a solution that uses constant memory and runs in linear time, which will find 3 candidates for elements that occur more than n/4 times. Note that if you assume that your data is given as a stream that you can only go through once, this is the best you can do -- you have to go through the stream one more time to test each of the 3 candidates to see if it occurs more than n/4 times in the stream. However, if you assume a priori that there are 3 elements that occur more than n/4 times then you only need to go through the stream once so you get a linear time online algorithm (only goes through the stream once) that only requires constant storage.
As you didnt mention space complexity , one possible solution is using hashtable for the elements which maps to count then you can just increment count if the element is found.

Redis Sorted Set Member Size and Performance

Redis Sorted Sets primarily sort based on a Score; however, in cases where multiple members share the same Score lexicographical (Alpha) sorting is used. The Redis zadd documentation indicates that the function complexity is:
"O(log(N)) where N is the number of elements in the sorted set"
I have to assume this remains true regardless of the member size/length; however, I have a case where there are only 4 scores resulting in members being sorted lexicographically after Score.
I want to prepend a time bases key to each member to have the secondary sort be time based and also add some uniqueness to the members. Something like:
"time-based-key:member-string"
My member-string can be larger JavaScript object literals like so:
JSON.stringify( {/* object literal */} )
Will the sorted set zadd and other functionality's performance remain constant?
If not, by what magnitude will performance be affected?
The complexity comes from the number of elements that need to be tested (compared against the new element) to find the correct insertion point (presumably using a binary search algorithm).
It says nothing about how long it will take to perform each test, because that's considered a constant factor (in the sense that it doesn't vary when you add more items).
The amount of data which needs to be compared before determining that a new element should go before or after an existing one will affect the total clock time, but it will do so for each comparison equally.
So your overall clock time for an insert will be quickest when comparing scores only, and progressively slower the deeper into a pair of strings it has to look to determine their lexical order. This won't be any particular magnitude, though, just the concrete number of microseconds to be multiplied by the log(n) complexity factor.

Need information on Big O Notation

Bit of a random question for you. If you have a method that has to check every single individual place inside an array, would it be okay to say that this method has notation of O(n)?
The reason i'm not sure if my answer is correct is due to the fact that as far as i'm aware O(n) relates to the number of items held in the array, while my assumption is based on the actual size of the array?
If your algorithm has to look at every item in the array, that algorithm is O(n). If doesn't really matter if the array is full or not, since you can be flexible in how you define n. It can be either the size of the array or the number of non-null elements in the array. If your algorithm has to look in empty array slots to see if they're empty or not, use the size. (If that's a real performance issue, probably a different data structure is called for.)
For a really contrived example, if it takes one hour to process each non-null array element, but one nanosecond to check for null, then you should define n to be the number of elements that actually exist, because that's what's going to dictate how the algorithm scales.

Most effective Algorithm to find maximum of double-precision values

What is the most effective way of finding a maximum value in a set of variables?
I have seen solutions, such as
private double findMax(double... vals) {
double max = Double.NEGATIVE_INFINITY;
for (double d : vals) {
if (d > max) max = d;
}
return max;
}
But, what would be the most effective algorithm for doing this?
You can't reduce the complexity below O(n) if the list is unsorted... but you can improve the constant factor by a lot. Use SIMD. For example, in SSE you would use the MAXSS instruction to perform 4-ish compare+select operations in a single cycle. Unroll the loop a bit to reduce the cost of loop control logic. And then outside the loop, find the max out of the four values trapped in your SSE register.
This gives a benefit for any size list... also using multithreading makes sense for really large lists.
Assuming the list does not have elements in any particular order, the algorithm you mentioned in your question is optimal. It must look at every element once, thus it takes time directly proportional to the to the size of the list, O(n).
There is no algorithm for finding the maximum that has a lower upper bound than O(n).
Proof: Suppose for a contradiction that there is an algorithm that finds the maximum of a list in less than O(n) time. Then there must be at least one element that it does not examine. If the algorithm selects this element as the maximum, an adversary may choose a value for the element such that it is smaller than one of the examined elements. If the algorithm selects any other element as the maximum, an adversary may choose a value for the element such that it is larger than the other elements. In either case, the algorithm will fail to find the maximum.
EDIT: This was my attempt answer, but please look at the coments where #BenVoigt proposes a better way to optimize the expression
You need to traverse the whole list at least once
so it'd be a matter of finding a more efficient expression for if (d>max) max=d, if any.
Assuming we need the general case where the list is unsorted (if we keep it sorted we'd just pick the last item as #IgnacioVazquez points in the comments), and researching a little about branch prediction (Why is it faster to process a sorted array than an unsorted array? , see 4th answer) , looks like
if (d>max) max=d;
can be more efficiently rewritten as
max=d>max?d:max;
The reason is, the first statement is normally translated into a branch (though it's totally compiler and language dependent, but at least in C and C++, and even in a VM-based language like Java happens) while the second one is translated into a conditional move.
Modern processors have a big penalty in branches if the prediction goes wrong (the execution pipelines have to be reset), while a conditional move is an atomic operation that doesn't affect the pipelines.
The random nature of the elements in the list (one can be greater or lesser than the current maximum with equal probability) will cause many branch predictions to go wrong.
Please refer to the linked question for a nice discussion of all this, together with benchmarks.

Resources