Time Complexity - Inserting an item in the middle of dynamic Array - data-structures

I came across this article regarding time complexity of dynamic arrays which clarified a lot. However I have a case based question. Say I have a dynamic array that has a total of 6 elements and suppose the 4th element is removed. In this case the deletion complexity would be O(n-index) which will be O(6-4) = 2 since the last two elements will only need to move up. Is this correct ? I have articles that give the worst case complexity saying that if the top most element is removed then time complexity would be O(n). I could not find anything that talked about removing/inserting an item from/in the center.

Your analysis of the number of items that need to be moved is correct. However, in big-O notation, that's still O(n). If you have n items in the list and remove something from the middle you'll have to move *0.5 * n* items. But when dealing with big-O we drop any constant multipliers so that's just O(n).

Related

write an Algorithm for Scenario using Selection sort?

This Scenario was in my final exam but can't write up an algorithm for it.
Everfresh Cattle farm has its annual Big Cattle Contest. Because the Bilal Haleem son Ali, is majoring in computer science, the county hires him to computerize the Big Cattle judging. Each Cattle’s name (string) and weight (integer) are to be read in from the keyboard. The county expects 500 entries this year. The output needed is a listing of the ten heaviest Cattles, sorted from biggest to smallest. Because Ali has just learned some sorting methods in school, he feels up to the task of writing this “pork-gram”. He writes a program to read in all the entries into an array of records, then uses a selection sort to put the entire array in order based on the Cattle Weight member. He then prints the ten largest values from the array. Can you think of a more efficient way to write this program? If so, write the algorithm.
The solution is to not sort the 500 entries, you only have to keep track of the top 10 encountered so far, sort it every time a new value is larger than best10[9].
First of all, I would not say that this is a C++ question. This actually sounds quite similar to some interview questions. I will assume you care about worst-case running times so will use Big-Oh and we will worry about the general case of n is the number of cattle and k is the number of entries we want.
The first observation you can make is looking at the running time of the algorithm they have used which is selection sort. It says that the entire array is sorted so the running time will be O(n^2 + k). Since k < n in this case, we get O(n^2).
The first observation to make is that we don't need to sort the entire array only get the k largest. In fact, if we're running selection sort to choose maximums, we have this after only k iterations of the algorithm giving a running time of O(kn). This may have been enough for the exam question, but we can simplify further.
The next observation extends from the observation that we only care about the k largest elements we have encountered so far. Let us say we have this collection of elements when we encounter a new element, we need to determine if it should be within that collection. It actually suffices to show that if it is greater than the smallest element of the collection, then we know that it must be in the collection.
Now, the question now becomes which data structure should we use to store this? Since we care about fast insertions, deletions of the smallest and getting of the smallest, the natural choice is a Min-Heap. Performing at most n operations on this heap each costing log k, the maximum size of the heap, gives us O(n log k) time.
There are some further minor improvements we can make which actually reduce this to O((n - k) log k), though I will leave it as an exercise to you to work out why this is. You should also note that the question asks for the output in sorted order, so you would have to deconstruct the heap at the end taking an additional O(k log k) steps.
In terms of the code, this will also be left to you, so best of luck.
p.s. homework / exam questions aren't the most welcome here and there are plenty of sources.

Heapsort and building heaps using linked list

I know that linked list is not a appropriate data structure for building heaps.
One of the answers here (https://stackoverflow.com/a/14584517/5841727) says that heap sort can be done in O(nlogn) using linked list which is same as with arrays.
I think that heapify operation would cost O(n) time in linked list and we would need (n/2) heapify operations leading to time complexity of O(n^2).
Can someone please tell how to achieve O(nlogn) complexity (for heap sort ) using linked list ?
Stackoverflow URL you mentioned is merely someone's claim (at least when I'm writing this) so based on assumption here is my answer. Mostly when people mention "Time complexity", they mean asymptomatic analysis and finding out the proportion to which time taken by algorithm would increase with increasing size of input ignoring all the constants.
To prove the time complexity with linkedlist lets assume there is a function which returns value for given index (i know linked list don't return by index). For efficiency of this function you'd also need to pass in level but we can ignore that for now since it doesn't have any impact on time complexity.
So now it comes down to analyzing time proportion impact on this function with increasing input size. Now you could imagine that for fixing (heapifying) one node you may have to traverse 3 times max (1. find out which one to swap with requires one traverse to compare one of two possible children, 2. going back to parent for swaping 3. coming back down to the one you just swapped). Now even though it may seem that you are doing max n/2 traversal for 3 times; for asymptomatic analysis this is just equals to 'n'. Now you'll have to do this for log n exactly same way how you do for an array. Therefore time-complexity is O(n log n). On wikipedia time-complexity table for heaps URL https://en.wikipedia.org/wiki/Binary_heap#Summary_of_running_times

Can my algorithm be done any better?

I have been presented with a challenge to make the most effective algorithm that I can for a task. Right now I came to the complexity of n * logn. And I was wondering if it is even possible to do it better. So basically the task is there are kids having a counting out game. You are given the number n which is the number of kids and m which how many times you skip someone before you execute. You need to return a list which gives the execution order. I tried to do it like this you use skip list.
Current = m
while table.size>0:
executed.add(table[current%table.size])
table.remove(current%table.size)
Current += m
My questions are is this correct? Is it n*logn and can you do it better?
Is this correct?
No.
When you remove an element from the table, the table.size decreases, and current % table.size expression generally ends up pointing at another irrelevant element.
For example, 44 % 11 is 0 but 44 % 10 is 4, an element in a totally different place.
Is it n*logn?
No.
If table is just a random-access array, it can take n operations to remove an element.
For example, if m = 1, the program, after fixing the point above, would always remove the first element of the array.
When an array implementation is naive enough, it takes table.size operations to relocate the array each time, leading to a total to about n^2 / 2 operations in total.
Now, it would be n log n if table was backed up, for example, by a balanced binary search tree with implicit indexes instead of keys, as well as split and merge primitives. That's a treap for example, here is what results from a quick search for an English source.
Such a data structure could be used as an array with O(log n) costs for access, merge and split.
But nothing so far suggests this is the case, and there is no such data structure in most languages' standard libraries.
Can you do it better?
Correction: partially, yes; fully, maybe.
If we solve the problem backwards, we have the following sub-problem.
Let there be a circle of k kids, and the pointer is currently at kid t.
We know that, just a moment ago, there was a circle of k + 1 kids, but we don't know where, at which kid x, the pointer was.
Then we counted to m, removed the kid, and the pointer ended up at t.
Whom did we just remove, and what is x?
Turns out the "what is x" part can be solved in O(1) (drawing can be helpful here), so the finding the last kid standing is doable in O(n).
As pointed out in the comments, the whole thing is called Josephus Problem, and its variants are studied extensively, e.g., in Concrete Mathematics by Knuth et al.
However, in O(1) per step, this only finds the number of the last standing kid.
It does not automatically give the whole order of counting the kids out.
There certainly are ways to make it O(log(n)) per step, O(n log(n)) in total.
But as for O(1), I don't know at the moment.
Complexity of your algorithm depends on the complexity of the operations
executed.add(..) and table.remove(..).
If both of them have complexity of O(1), your algorithm has complexity of O(n) because the loop terminates after n steps.
While executed.add(..) can easily be implemented in O(1), table.remove(..) needs a bit more thinking.
You can make it in O(n):
Store your persons in a LinkedList and connect the last element with the first. Removing an element costs O(1).
Goging to the next person to choose would cost O(m) but that is a constant = O(1).
This way the algorithm has the complexity of O(n*m) = O(n) (for constant m).

How can the worst case for an algorithm have different bounds?

I've been trying to figure this out all day. Some other threads address this, but I really don't understand the answers. There are also many answers that contradict one another.
I understand that an algorithm will never take longer than the upper bound and never be faster than the lower bound. However, I didn't know an upper bound existed for best case time and a lower bound existed for worst case time. This question really threw me in a loop. I can't wrap my head around this... a given run time can have a different upper and lower bound?
For example, if someone asked: "Show that the worst-case running time of some algorithm on a heap of size n is Big Omega(lg(n))". How do you possibly get a lower bound, any bound for that matter, when given a run time?
So, in summation, an algorithm's worst case upper bound can be different than its worst case lower bound? How can this be? Once given the case, don't bounds become irrelevant? Trying to independent study algorithms and I really need to wrap my head around this first.
The meat of my accepted answer to that question is a function whose running time oscillates between n^2 and n^3 depending on whether n is odd. The point that I was trying to make is that sometimes bounds of the form O(n^k) and Omega(n^k) aren't sufficiently descriptive, even though the worst case running time is a perfectly well defined function (which, like all functions, is its own best lower and upper bound). This happens with more natural functions like n log n, which is Omega(n^k) but not O(n^k) for k ≤ 1, and O(n^k) but not Omega(n^k) for k > 1 (and hence not Theta(n^k) regardless of how we choose a constant k).
Suppose you write a program like this to find the smallest prime factor of an integer:
function lpf(n):
for i = 2 to n
if n%i == 0 then return i
If you run the function on the number 10^11 + 3, it will take 10^11 + 2 steps. If you run it on the number 10^11 + 4 it will take just one step. So the function's best-case time is O(1) steps and its worst-case time is O(n) steps.
Big O notation, describes efficiency in runtime iterations, generally based on size of an input data set.
The notation is written in its simplest form, ignoring multiples or additives, but keeping exponential form. If you have an operation of O(1) it is executed in constant time, no matter the input data.
However if you have something such as O(N) or O(log(N)), they will execute at different rates depending on input data.
The high and low bounds describe the largest and least iterations, respectively, that an algorithm can take.
Example: O(N), high bound is largest input data and low bound is smallest.
Extra sources:
Big O Cheat Sheet and MIT Lecture Notes
UPDATE:
Looking at the Stack Overflow question mentioned above, that algorithm is broken into three parts, where it has 3 possible types of runtime, depending on data. Now really, this is three different algorithms designed to handle for different data values. An algorithm is generally classified with just one notation of efficiency and that is of the notation taking the least time for ALL possible values of N.
In the case of O(N^2), larger data will take exponentially longer, and having a smaller number will proceed quickly. The algorithm determines how quickly a data set will be run, yet bounds are given depending on the range of data the algorithm is designed to handle.
I will try to explain it in the quicksort algorithm.
In quicksort you have an array and choose an element as pivot. The next step is to partition the input array into two arrays. The first one will contain elements < pivot and the second one elements > pivot.
Now assume you will apply quicksort on an already sorted list and the pivot element will always be the last element of the array. The result of partition will be an array of size n-1 and an array oft size 1 (the pivot element). This will result in a runtime of O(n*n). Now assume that the pivot element will always split the array in two equal sized array. In every step the array size will be cut in halves. This will result in O(n log n). I hope this example will make this a bit clearer for you.
Another well known sort algorithm is mergesort. Mergesort has always runtime of O(n log n). In mergesort you will cut the array down until only one element is left und will climb up the call stack to merge the one sized arrays and after that merge the array of size two and so on.
Let's say you implement a set using an array. To insert a element you simply put in the next available bucket. If there is no available bucket you increase the capacity of the array by a value m.
For the insert algorithm "there is no enough space" is the worse case.
insert (S, e)
if size(S) >= capacity(S)
reserve(S, size(S) + m)
put(S,e)
Assume we never delete elements. By keeping track of the last available position, put, size and capacity are Θ(1) in space and memory.
What about reserve? If it is implemented like [realloc in C][1], in the best case you just allocate new memory at the end of the existing memory (best case for reserve), or you have to move all existing elements as well (worse case for reserve).
The worst case lower bound for insert is the best case of
reserve(), which is linear in m if we dont nitpick. insert in
worst case is Ω(m) in space and time.
The worst case upper bound for insert is the worse case of
reserve(), which is linear in m+n. insert in worst case is
O(m+n) in space and time.

what does O(N) mean [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
What is Big O notation? Do you use it?
Hi all,
fairly basic scalability notation question.
I recently recieved a comment on a post that my python ordered-list implimentation
"but beware that your 'ordered set' implementation is O(N) for insertions"
Which is great to know, but I'm not sure what this means.
I've seen notation such as n(o) o(N), N(o-1) or N(o*o)
what does the above notation refer to?
The comment was referring to the Big-O Notation.
Briefly:
O(1) means in constant time -
independent of the number of items.
O(N) means in proportion to the
number of items.
O(log N) means a time proportional to
log(N)
Basically any 'O' notation means an operation will take time up to a maximum of k*f(N)
where:
k is a constant multiplier
f() is a function that depends on N
O(n) is Big O Notation and refers to the complexity of a given algorithm. n refers to the size of the input, in your case it's the number of items in your list.
O(n) means that your algorithm will take on the order of n operations to insert an item. e.g. looping through the list once (or a constant number of times such as twice or only looping through half).
O(1) means it takes a constant time, that it is not dependent on how many items are in the list.
O(n^2) means that for every insert, it takes n*n operations. i.e. 1 operation for 1 item, 4 operations for 2 items, 9 operations for 3 items. As you can see, O(n^2) algorithms become inefficient for handling large number of items.
For lists O(n) is not bad for insertion, but not the quickest. Also note that O(n/2) is considered as being the same as O(n) because they both grow at the same rate with n.
It's called Big O Notation: http://en.wikipedia.org/wiki/Big_O_notation
So saying that insertion is O(n) means that you have to walk through the whole list (or half of it -- big O notation ignores constant factors) to perform the insertion.
This looks like a nice introduction: http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
Specifically O(n) means that if there's 2x as many items in the list, it'll takes No more than twice as long, if there's 50 times as many it'll take No more than 50 times as long. See the wikipedia article dreeves pointed out for more details
Edit (in bold above): It was pointed out that Big-O does represent the upper bound, so if there's twice as many elements in the list, insertion will take at most twice as long, and if there's 50 times as many elements, it would take at most 50 times as long.
If it was additionally Ω(n) (Big Omega of n) then it would take at least twice as long for a list that is twice as big. If your implementation is both O(n) and Ω(n), meaning that it'll take both at least and at most twice as long for a list twice as big, then it can be said to be Θ(n) (Big Theta of n), meaning that it'll take exactly twice as long if there are twice as many elements.
According to Wikipedia (and personal experience, being guilty of it myself) Big-O is often used where Big-Theta is what is meant. It would be technically correct to call your function O(n^n^n^n) because all Big-O says is that your function is no slower than that, but no one would actually say that other than to prove a point because it's not very useful and misleading information, despite it being technically accurate.
It refers to how complex your program is, i.e., how many operations it takes to actually solve a problem. O(n) means that each operation takes the same number of steps as the items in your list, which for insertion, is very slow. Likewise, if you have O(n^2) means that any operation takes "n" squared number of steps to accomplish, and so on... The "O" is for Order of Magnitude, and the the expression in the parentheses is always related to the number of items being manipulated in the procedure.
Short answer: It means that the processing time is in linear relation to the size of input. E.g if the size of input (length of list) triples, the processing time (roughly) triples. And if it increases thousandfold, the processing time also increases in the same magnitude.
Long answer: See the links provided by Ian P and dreeves
This may help:
http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions
O(n): Finding an item in an unsorted
list or a malformed tree (worst case);
adding two n-digit numbers
Good luck!
Wikipedia explains it far better than I can, however it means that if your list size is N, it takes at max N loops/iterations to insert an item. (In effect, you have to iterate over the whole list)
If you want a better understanding, there is a free book from Berkeley that goes more in-depth about the notation.

Resources