I'm having trouble with solving an excercise. The context is an asymptotic analysis of the running time. There are given algorithms like Insertion Sort etc. The result should be the theta notation (asymptotic exact) for the input: {N, N-1, ..., N/2, 1, 1, 2, 3, ..., N/2}. The problem is: How can I calculate the running time? I mean, it's no problem to calculate the worst-case or best-case scenario. My problem is how to handle the inputs and how to consider them in the calculation.
Thanks for your help!
Greetings
GR
See comments:
Have you tried listing the steps the program actually will take for some simple input like (4, 3, 2, 1, 1, 2) or (6, 5, 4, 3, 1, 1, 2, 3)? Can you "list" the steps for the general case N? – David K Oct 23 '14 at 16:32
First of all thanks for your answer. :-) I simply count the made additions and compares. So in Insertion Sort there are n(n-1)\2 Operations. In this case the Theta is Theta(n*n). My problem now is, how can I map this to an real input? – GR_ Oct 23 '14 at 18:31
If you actually have counted operations for the worst-case complexity of insertion sort, then you can tell what two numbers are compared by the 10th operation for sorting the numbers 1 through 100. That is, counting operations is mapping the operations to real input. It is actually a harder problem because you must also determine what input is the worst case, whereas here the input is already described for you. – David K Oct 23 '14 at 19:01
Related
I faced this problem on a website and I quite can't understand the output, please help me understand it :-
Bogosort, is a dumb algorithm which shuffles the sequence randomly until it is sorted. But here we have tweaked it a little, so that if after the last shuffle several first elements end up in the right places we will fix them and don't shuffle those elements furthermore. We will do the same for the last elements if they are in the right places. For example, if the initial sequence is (3, 5, 1, 6, 4, 2) and after one shuffle we get (1, 2, 5, 4, 3, 6) we will keep 1, 2 and 6 and proceed with sorting (5, 4, 3) using the same algorithm. Calculate the expected amount of shuffles for the improved algorithm to sort the sequence of the first n natural numbers given that no elements are in the right places initially.
Input:
2
6
10
Output:
2
1826/189
877318/35343
For each test case output the expected amount of shuffles needed for the improved algorithm to sort the sequence of first n natural numbers in the form of irreducible fractions. I just can't understand the output.
I assume you found the problem on CodeChef. There is an explanation of the answer to the Bogosort problem here.
Ok I think I found the answer, there is a similar problem here https://math.stackexchange.com/questions/20658/expected-number-of-shuffles-to-sort-the-cards/21273 , and this problem can be thought of as its extension
This is not my school home work. This is my own home work and I am self-learning algorithms.
In Algorithm Design Manual, there is such an excise
4-25 Assume that the array A[1..n] only has numbers from {1, . . . , n^2} but that at most log log n of these numbers ever appear. Devise an algorithm that sorts A in substantially less than O(n log n).
I have two approaches:
The first approach:
Basically I want to do counting sort for this problem. I can first scan the whole array (O(N)) and put all distinct numbers into a loglogN size array (int[] K).
Then apply counting sort. However, when setting up the counting array (int[] C), I don't need to set its size as N^2, instead, I set the size as loglogN too.
But in this way, when counting the frequencies of each distinct number, I have to scan array K to get that element's index (O(NloglogN) and then update array C.
The second approach:
Again, I have to scan the whole array to get a distinct number array K with size loglogN.
Then I just do a kind of quicksort like, but the partition is based on median of K array (i.e., each time the pivot is an element of K array), recursively.
I think this approach will be best, with O(NlogloglogN).
Am I right? or there are better solutions?
Similar excises exist in Algorithm Design Manual, such as
4-22 Show that n positive integers in the range 1 to k can be sorted in O(n log k) time. The interesting case is when k << n.
4-23 We seek to sort a sequence S of n integers with many duplications, such that the number of distinct integers in S is O(log n). Give an O(n log log n) worst-case time algorithm to sort such sequences.
But basically for all these excises, my intuitive was always thinking of counting sort as we can know the range of the elements and the range is short enough comparing to the length of the whole array. But after more deeply thinking, I guess what the excises are looking for is the 2nd approach, right?
Thanks
We can just create a hash map storing each element as key and its frequency as value.
Sort this map in log(n)*log(log(n)) time i.e (klogk) using any sorting algorithm.
Now scan the hash map and add elements to the new array frequency number of times. Like so:
total time = 2n+log(n)*log(log(n)) = O(n)
Counting sort is one of possible ways:
I will demonstrate this solution on example 2, 8, 1, 5, 7, 1, 6 and all number are <= 3^2 = 9. I use more elements to make my idea more clearer.
First for each number A[i] compute A[i] / N. Lets call this number first_part_of_number.
Sort this array using counting sort by first_part_of_number.
Results are in form (example for N = 3)
(0, 2)
(0, 1)
(0, 1)
(2, 8)
(2, 6)
(2, 7)
(2, 6)
Divide them into groups by first_part_of_number.
In this example you will have groups
(0, 2)
(0, 1)
(0, 1)
and
(2, 8)
(2, 6)
(2, 7)
(2, 6)
For each number compute X modulo N. Lets call it second_part_of_number. Add this number to each element
(0, 2, 2)
(0, 1, 1)
(0, 1, 1)
and
(2, 8, 2)
(2, 6, 0)
(2, 7, 1)
(2, 6, 0)
Sort each group using counting sort by second_part_of_number
(0, 1, 1)
(0, 1, 1)
(0, 2, 2)
and
(2, 6, 0)
(2, 6, 0)
(2, 7, 1)
(2, 8, 2)
Now combine all groups and you have result 1, 1, 2, 6, 6, 7, 8.
Complexity:
You were using only counting sort on elements <= N.
Each element took part in exactly 2 "sorts". So overall complexity is O(N).
I'm going to betray my limited knowledge of algorithmic complexity here, but:
Wouldn't it make sense to scan the array once and build something like a self-balancing tree? As we know the number of nodes in the tree will only grow to (log log n) it is relatively cheap (?) to find a number each time. If a repeat number is found (likely) a counter in that node is incremented.
Then to construct the sorted array, read the tree in order.
Maybe someone can comment on the complexity of this and any flaws.
Update: After I wrote the answer below, #Nabb showed me why it was incorrect. For more information, see Wikipedia's brief entry on Õ, and the links therefrom. At least because it is still needed to lend context to #Nabb's and #Blueshift's comments, and because the whole discussion remains interesting, my original answer is retained, as follows.
ORIGINAL ANSWER (INCORRECT)
Let me offer an unconventional answer: though there is indeed a difference between O(n*n) and O(n), there is no difference between O(n) and O(n*log(n)).
Now, of course, we all know that what I just said is wrong, don't we? After all, various authors concur that O(n) and O(n*log(n)) differ.
Except that they don't differ.
So radical-seeming a position naturally demands justification, so consider the following, then make up your own mind.
Mathematically, essentially, the order m of a function f(z) is such that f(z)/(z^(m+epsilon)) converges while f(z)/(z^(m-epsilon)) diverges for z of large magnitude and real, positive epsilon of arbitrarily small magnitude. The z can be real or complex, though as we said epsilon must be real. With this understanding, apply L'Hospital's rule to a function of O(n*log(n)) to see that it does not differ in order from a function of O(n).
I would contend that the accepted computer-science literature at the present time is slightly mistaken on this point. This literature will eventually refine its position in the matter, but it hasn't done, yet.
Now, I do not expect you to agree with me today. This, after all, is merely an answer on Stackoverflow -- and what is that compared to an edited, formally peer-reviewed, published computer-science book -- not to mention a shelffull of such books? You should not agree with me today, only take what I have written under advisement, mull it over in your mind these coming weeks, consult one or two of the aforementioned computer-science books that take the other position, and make up your own mind.
Incidentally, a counterintuitive implication of this answer's position is that one can access a balanced binary tree in O(1) time. Again, we all know that that's false, right? It's supposed to be O(log(n)). But remember: the O() notation was never meant to give a precise measure of computational demands. Unless n is very large, other factors can be more important than a function's order. But, even for n = 1 million, log(n) is only 20, compared, say, to sqrt(n), which is 1000. And I could go on in this vein.
Anyway, give it some thought. Even if, eventually, you decide that you disagree with me, you may find the position interesting nonetheless. For my part, I am not sure how useful the O() notation really is when it comes to O(log something).
#Blueshift asks some interesting questions and raises some valid points in the comments below. I recommend that you read his words. I don't really have a lot to add to what he has to say, except to observe that, because few programmers have (or need) a solid grounding in the mathematical theory of the complex variable, the O(log(n)) notation has misled probably, literally hundreds of thousands of programmers to believe that they were achieving mostly illusory gains in computational efficiency. Seldom in practice does reducing O(n*log(n)) to O(n) really buy you what you might think that it buys you, unless you have a clear mental image of how incredibly slow a function the logarithm truly is -- whereas reducing O(n) even to O(sqrt(n)) can buy you a lot. A mathematician would have told the computer scientist this decades ago, but the computer scientist wasn't listening, was in a hurry, or didn't understand the point. And that's all right. I don't mind. There are lots and lots of points on other subjects I don't understand, even when the points are carefully explained to me. But this is a point I believe that I do happen to understand. Fundamentally, it is a mathematical point not a computer point, and it is a point on which I happen to side with Lebedev and the mathematicians rather than with Knuth and the computer scientists. This is all.
I'm auditing this algorithms class for work and I'm trying to do some practice problems given in class. This problem has me stumped and I just can't wrap my head around it. None of my solutions come out in O(logn) time. Can anyone help me with this problem??
Question:
Suppose that we are given a sequence of n values x1, x2, ... , xn in an arbitrary order and
seek to quickly answer repeated queries of the form: given an arbitrary pair i and j with
1 ≤ i < j ≤ n, find the smallest value in x1, ... , xj . Design a data structure that uses O(n) space and answers each query in O(log n) time.
For input of a1,a2,a3,...an , construct a node that contains minimum of (a1,..,ak) and minimum of (ak+1,..,an) where k = n/2.
Recursively construct the rest of the tree.
Now, if you want to find the minimum between ai and aj:
Identify the lowest common ancestor of i,j. Let it be k
Start with i and keep moving until you hit k. AT every iteration check if the child node was left node. If yes, then compare the right subtree's min and update current min accordingly.
Similarly, for j, check if it is right node....
At node k compare values returned by each subtree and return the min
People are overthinking this. Suppose that you start with the list:
47, 13, 55, 29, 56, 9, 17, 48, 69, 15
Make the following list of lists:
47, 13, 55, 29, 56, 9, 17, 48, 69, 15
13, 29, 9, 17, 15
13, 9, 15
9, 15
9
I leave the construction of these lists, correct usage, and proof that they provide an answer to the original question as exercises for the reader. (It might not be homework for you, but it could easily be for someone, and I don't like giving complete answers to homework questions.)
I think the crucial step is that you'll need to sort the data before hand. Then you can store the data in an array/list. Then you can run through a quick binary search in O(logn), picking out the first value that satisfies the condition (I'm assuming you meant between xi and xj, not x1 and xj).
edit: on second thought, ensuring that the value satisfies the condition may not be as trivial as I thought
The question was asked before in a slightly different way: What data structure using O(n) storage with O(log n) query time should I use for Range Minimum Queries?
Nevertheless, to quickly answer, the problem you're facing it's a well studied one - Range Minimum Query. A Segment Tree is a Data Structure that can solve the problem with O(N) space and O(logN) time requirements. You can see more details in here, where there's an explanation of the structure and the complexities involved.
Trying to explain the suggested data structure:
For every pair of numbers, calculate and keep the value of the smaller one.
For every four consecutive numbers, calculate and keep the value of the smallest of the four. This is done quickly by picking the smaller of the two pair values.
For every eight consecutive numbers, calculate and keep the value of the smallest of the eight.
And so on.
Let's say we want the smallest value of x19 to x65.
We look at the following stored values:
Smallest of x32 to x63.
Smallest of x24 to x31.
Smallest of x20 to x23.
x19.
Smallest of x64 to x65.
Then we pick the smallest of these.
I'm coding a question on an online judge for practice . The question is regarding optimizing Bogosort and involves not shuffling the entire number range every time. If after the last shuffle several first elements end up in the right places we will fix them and don't shuffle those elements furthermore. We will do the same for the last elements if they are in the right places. For example, if the initial sequence is (3, 5, 1, 6, 4, 2) and after one shuffle Johnny gets (1, 2, 5, 4, 3, 6) he will fix 1, 2 and 6 and proceed with sorting (5, 4, 3) using the same algorithm.
For each test case output the expected amount of shuffles needed for the improved algorithm to sort the sequence of first n natural numbers in the form of irreducible fractions.
A sample input/output says that for n=6, the answer is 1826/189.
I don't quite understand how the answer was arrived at.
This looks similar to 2011 Google Code Jam, Preliminary Round, Problem 4, however the answer is n, I don't know how you get 1826/189.
let's say i have an array, size 40. and the element im looking for is in position 38.
having a simple loop, it will take 38 steps right?
but, having, 2 loops, running in parallel, and a variable, "found"
set to false, and changes to true when the element is found.
the first loop, will start from index 0
the second loop, will start from index 40.
so basically, it will take only, 4 steps right? to find the element. the worst case will be if the element is in the middle of the array. right?
It depends how much work it takes to synchronize the state between the two threads.
If it takes 0 work, then this will be, on average, 50% faster than a straight through algorithm.
On the other hand, if it takes more work than X, it will start to get slower (which is very likely the case).
From an algorithm standpoint, I don't think this is how you want to go. Even 2 threads is still going to be O(n) runtime. You would want to sort the data (n log n ), and then do a binary search to get the data. Especially you can sort it 1 time and use it for many searches...
If you're talking about algorithmic complexity, this is still a linear search. Just because you're searching two elements per iteration doesn't change the fact that the algorithm is O(n).
In terms of actual performance you would see, this algorithm is likely to be slower than a linear search with a single processor. Since very little work is done per-element in a search, the algorithm would be memory bound, so there would be no benefit to using multiple processors. Also, since you're searching in two locations, this algorithm would not be as cache efficient. And then, as bwawok points out, there would be a lot of time lost in synchronization.
When you are running in parallel you are dividing your CPU power into two + creating some overhead. If you mean you are running the search on a say, a multicore machine, with your proposed algorithm then the worse case is 20 steps. You are not making any change in the complexity class. So where those 4 steps, that you mentioned, are coming from?
On average there is no different in runtime.
Take for example if you are searching for an item out of 10.
The original algorithm will process in the following search order:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
The worse case is the last item (taking 10 steps).
While the second algorithm will process in the following search order:
1, 3, 5, 7, 9, 10, 8, 6, 4, 2
The worse case in this scenario is item 6 (taking 10 steps).
There are some cases where algorithm 1 is faster.
There are some cases where algorithm 2 is faster.
Both take the same time on average - O(n).
On a side note, it is interesting to compare this to a binary search order (on a sorted array).
4, 3, 2, 3, 1, 4, 3, 2, 4, 3
Taking at most 4 steps to complete.