The Incompressibility Method is said to simplify the analysis of algorithms for the average case. From what I understand, this is because there is no need to compute all of the possible combinations of input for that algorithm and then derive an average complexity. Instead, a single incompressible string is taken as the input. As an incompressible string is typical, we can assume that this input can act as an accurate approximation of the average case.
I am lost in regard to actually applying the Incompressibility Method to an algorithm. As an aside, I am not a mathematician, but think that this theory has practical applications in everyday programming.
Ultimately, I would like to learn how I can deduce the average case of any given algorithm, be it trivial or complex. Could somebody please demonstrate to me how the method can be applied to a simple algorithm? For instance, given an input string S, store all of the unique characters in S, then print each one individually:
void uniqueChars(String s) {
char[] chars = chars[ s.length() ];
int free_idx = 0;
for (int i = 0; i < s.length(); i++) {
if (! s[i] in chars) {
chars[free_idx] = s[i];
free_idx++;
}
}
for (int i = 0; i < chars.length(); i++) {
print (chars[i]);
}
}
Only for the sake of argument. I think pseudo-code is sufficient. Assume a linear search for checking whether the array contains an element.
Better algorithms by which the theory can be demonstrated are acceptable, of course.
This question maybe nonsensical and impractical, but I would rather ask than hold misconceptions.
reproducing my answer on the CS.Se question, for inter-reference purposes
Kolmogorov Complexity (or Algorithmic Complexity) deals with optimal descriptions of "strings" (in the general sense of strings as sequences of symbols)
A string is (sufficiently) incompressible or (sufficiently) algorithmicaly random if its (algorithmic) description (kolmogorov comlplexity K) is not less than its (literal) size. In other words the optimal description of the string, is the string itself.
Major result of the theory is that most strings are (algorithmicaly) random (or typical) (which is also related to other areas like Goedel's Theorems, through Chaitin's work)
Kolmogorov Complexity is related to Probabilistic (or Shannon) Entropy, in fact Entropy is an upper bound on KC. And this relates analysis based on descriptive complexity to probabilistic-based analysis. They can be inter-changeable.
Sometimes it might be easier to use probabilisrtic analysis, others descriptive complexity (views of the same lets say)
So in the light of the above, assuming an algorithmicaly random input to an algorithm, one asumes the following:
The input is typical, thus the analysis describes average-case scenario (point 3 above)
The input size is related in certain way to its probability (point 2 above)
One can pass from algorithmic view to probabilistic view (point 4 above)
Related
I'm trying to solve question 11.1 in Elements of Programming Interviews (EPI) in Java: Search a Sorted Array for First Occurrence of K.
The problem description from the book:
Write a method that takes a sorted array and a key and returns the index of the first occurrence of that key in the array.
The solution they provide in the book is a modified binary search algorithm that runs in O(logn) time. I wrote my own algorithm also based on a modified binary search algorithm with a slight difference - it uses recursion. The problem is I don't know how to determine the time complexity of my algorithm - my best guess is that it will run in O(logn) time because each time the function is called it reduces the size of the candidate values by half. I've tested my algorithm against the 314 EPI test cases that are provided by the EPI Judge so I know it works, I just don't know the time complexity - here is the code:
public static int searchFirstOfKUtility(List<Integer> A, int k, int Lower, int Upper, Integer Index)
{
while(Lower<=Upper){
int M = Lower + (Upper-Lower)/2;
if(A.get(M)<k)
Lower = M+1;
else if(A.get(M) == k){
Index = M;
if(Lower!=Upper)
Index = searchFirstOfKUtility(A, k, Lower, M-1, Index);
return Index;
}
else
Upper=M-1;
}
return Index;
}
Here is the code that the tests cases call to exercise my function:
public static int searchFirstOfK(List<Integer> A, int k) {
Integer foundKey = -1;
return searchFirstOfKUtility(A, k, 0, A.size()-1, foundKey);
}
So, can anyone tell me what the time complexity of my algorithm would be?
Assuming that passing arguments is O(1) instead of O(n), performance is O(log(n)).
The usual theoretical approach for analyzing recursion is calling the Master Theorem. It is to say that if the performance of a recursive algorithm follows a relation:
T(n) = a T(n/b) + f(n)
then there are 3 cases. In plain English they correspond to:
Performance is dominated by all the calls at the bottom of the recursion, so is proportional to how many of those there are.
Performance is equal between each level of recursion, and so is proportional to how many levels of recursion there are, times the cost of any layer of recursion.
Performance is dominated by the work done in the very first call, and so is proportional to f(n).
You are in case 2. Each recursive call costs the same, and so performance is dominated by the fact that there are O(log(n)) levels of recursion times the cost of each level. Assuming that passing a fixed number of arguments is O(1), that will indeed be O(log(n)).
Note that this assumption is true for Java because you don't make a complete copy of the array before passing it. But it is important to be aware that it is not true in all languages. For example I recently did a bunch of work in PL/pgSQL, and there arrays are passed by value. Meaning that your algorithm would have been O(n log(n)).
I have one exercise from my algorithm text book and I am not really sure about the solution. I need to explain why this solution:
function array_merge_sorted(array $foo, array $bar)
{
$baz = array_merge($foo, $bar);
$baz = array_unique($baz);
sort($baz);
return $baz;
}
that merge two array and order them is not the most efficient and I need to provide one solution that is the most optimized and prove that not better solution can be done.
My idea was about to use a mergesort algorithm that is O(n log n), to merge and order the two array passed as parameter. But how can I prove that is the best solution ever?
Algorithm
As you have said that both inputs are already sorted, you can use a simple zipper-like approach.
You have one pointer for each input array, pointing to the begin of it. Then you compare both elements, adding the smaller one to the result and advancing the pointer of the array with the smaller element. Then you repeat the step until both pointers reached the end and all elements where added to the result.
You find a collection of such algorithms at Wikipedia#Merge algorithm with my current presented approach being listed as Merging two lists.
Here is some pseudocode:
function Array<Element> mergeSorted(Array<Element> first, Array<Element> second) {
Array<Element> result = new Array<Element>(first.length + second.length);
int firstPointer = 0;
int secondPointer = 0;
while (firstPointer < first.length && secondPointer < first.length) {
Element elementOfFirst = first.get(firstPointer);
Element elementOfSecond = second.get(secondPointer);
if (elementOfFirst < elementOfSecond) {
result.add(elementOfFirst);
firstPointer = firstPointer + 1;
} else {
result.add(elementOfSecond);
secondPointer = secondPointer + 1;
}
}
}
Proof
The algorithm obviously works in O(n) where n is the size of the resulting list. Or more precise it is O(max(n, n') with n being the size of the first list and n' of the second list (or O(n + n') which is the same set).
This is also obviously optimal since you need, at some point, at least traverse all elements once in order to build the result and know the final ordering. This yields a lower bound of Omega(n) for this problem, thus the algorithm is optimal.
A more formal proof assumes a better arbitrary algorithm A which solves the problem without taking a look at each element at least once (or more precise, with less than O(n)).
We call that element, which the algorithm does not look at, e. We can now construct an input I such that e has a value which fulfills the order in its own array but will be placed wrong by the algorithm in the resulting array.
We are able to do so for every algorithm A and since A always needs to work correctly on all possible inputs, we are able to find a counter-example I such that it fails.
Thus A can not exist and Omega(n) is a lower bound for that problem.
Why the given algorithm is worse
Your given algorithm first merges the two arrays, this works in O(n) which is good. But after that it sorts the array.
Sorting (more precise: comparison-based sorting) has a lower-bound of Omega(n log n). This means every such algorithm can not be better than that.
Thus the given algorithm has a total time complexity of O(n log n) (because of the sorting part). Which is worse than O(n), the complexity of the other algorithm and also the optimal solution.
However, to be super-correct, we also would need to argue whether the sort-method truly yields that complexity, since it does not get arbitrary inputs but always the result of the merge-method. Thus it could be possible that a specific sorting method works especially good for such specific inputs, yielding O(n) in the end.
But I doubt that this is in the focus of your task.
This question already has answers here:
Big O, how do you calculate/approximate it?
(24 answers)
Closed 7 years ago.
This is likely ground that has been covered but I have yet to find an explanation that I am able to understand. It is likely that I will soon feel embarrassed.
For instance, I am trying to find the order of magnitude using Big-O notation of the following:
count = 0;
for (i = 1; i <= N; i++)
count++;
Where do I begin to find what defines the magnitude? I'm relatively bad at mathematics and, even though I've tried a few resources, have yet to find something that can explain the way a piece of code is translated to an algebraic equation. Frankly, I can't even surmise a guess as to what the Big-O efficiency is regarding this loop.
These notations (big O, big omega, theta) simply say how does the algorithm will be "difficult" (or complex) asymptotically when things will get bigger and bigger.
For big O, having two functions: f(x) and g(x) where f(x) = O(g(x)) then you can say that you are able to find one x from which g(x) will be always bigger than f(x). That is why the definition contains "asymptotically" because these two functions may have any run at the beginning (for example f(x) > g(x) for few first x) but from the single point, g(x) will get always superior (g(x) >= f(x)). So you are interested in behavior in a long run (not for small numbers only). Sometimes big-O notation is named upper bound because it describes the worst possible scenario (it will never be asymptotically more difficult that this function).
That is the "mathematical" part. When it comes to practice you usually ask: How many times the algorithm will have to process something? How many operations will be done?
For your simple loop, it is easy because as your N will grow, the complexity of algorithm will grow linearly (as simple linear function), so the complexity is O(N). For N=10 you will have to do 10 operations, for N=100 => 100 operations, for N=1000 => 1000 operations... So the growth is truly linear.
I'll provide few another examples:
for (int i = 0; i < N; i++) {
if (i == randomNumber()) {
// do something...
}
}
Here it seems that the complexity will be lower because I added the condition to the loop, so we have possible chance the number of "doing something" operations will be lower. But we don't know how many times the condition will pass, it may happen it passes every time, so using big-O (the worst case) we again need to say that the complexity is O(N).
Another example:
for (int i = 0; i < N; i++) {
for (int i = 0; i < N; i++) {
// do something
}
}
Here as N will be bigger and bigger, the # of operations will grow more rapidly. Having N=10 means that you will have to do 10x10 operations, having N=100 => 100x100 operations, having N=1000 => 1000x1000 operations. You can see the growth is no longer linear it is N x N, so we have O(N x N).
For the last example I will use idea of full binary tree. Hope you know what binary tree is. So if you have simple reference to the root and you want to traverse it to the left-most leaf (from top to bottom), how many operations will you have to do if the tree has N nodes? The algorithm would be something similar to:
Node actual = root;
while(actual.left != null) {
actual = actual.left
}
// in actual we have left-most leaf
How many operations (how long loop will execute) will you have to do? Well that depends on the depth of the tree, right? And how is defined depth of full binary tree? It is something like log(N) - with base of logarithm = 2. So here, the complexity will be O(log(N)) - generally we don't care about the base of logarithm, what we care about is the function (linear, quadratic, logaritmic...)
Your example is the order
O(N)
Where N=number of elements, and a comparable computation is performed on each, thus
for (int i=0; i < N; i++) {
// some process performed N times
}
The big-O notation is probably easier than you think; in all daily code you will find examples of O(N) in loops, list iterations, searches, and any other process that does work once per individual of a set. It is the abstraction that is first unfamiliar, O(N) meaning "some unit of work", repeated N times. This "something" can be a an incrementing counter, as in your example, or it can be lengthy and resource intensive computation. Most of the time in algorithm design the 'big-O', or complexity, is more important than the unit of work, this is especially relevant as N becomes large. The description 'limiting' or 'asymptotic' is mathematically significant, it means that an algorithm of lesser complexity will always beat one that is greater no matter how significant the unit of work, given that N is large enough, or "as N grows"
Another example, to understand the general idea
for (int i=0; i < N; i++) {
for (int j=0; j < N; j++) {
// process here NxN times
}
}
Here the complexity is
O(N2)
For example, if N=10, then the second "algorithm" will take 10 times longer than the first, because 10x10 = 100 (= ten times larger). If you consider what will happen when N equals, say a million, or billion, you should be able to work out it will also take this much longer. So if you can find a way to do something in O(N) that a super-computer does in O(N2), you should be able to beat it with your old x386, pocket watch, or other old tool
A lecturer gave this question in class:
[question]
A sequence of n integers is stored in
an array A[1..n]. An integer a in A is
called the majority if it appears more
than n/2 times in A.
An O(n) algorithm can be devised to
find the majority based on the
following observation: if two
different elements in the original
sequence are removed, then the
majority in the original sequence
remains the majority in the new
sequence. Using this observation, or
otherwise, write programming code to
find the majority, if one exists, in
O(n) time.
for which this solution was accepted
[solution]
int findCandidate(int[] a)
{
int maj_index = 0;
int count = 1;
for (int i=1;i<a.length;i++)
{
if (a[maj_index] == a[i])
count++;
else
count--;
if (count == 0)
{
maj_index =i;
count++;
}
}
return a[maj_index];
}
int findMajority(int[] a)
{
int c = findCandidate(a);
int count = 0;
for (int i=0;i<a.length;i++)
if (a[i] == c) count++;
if (count > n/2) return c;
return -1;//just a marker - no majority found
}
I can't see how the solution provided is a dynamic solution. And I can't see how based on the wording, he pulled that code out.
The origin of the term dynamic programming is trying to describe a really awesome way of optimizing certain kinds of solutions (dynamic was used since it sounded punchier). In other words, when you see "dynamic programming", you need to translate it into "awesome optimization".
'Dynamic programming' has nothing to do with dynamic allocation of memory or whatever, it's just an old term. In fact, it has little to do with modern meaing of "programming" also.
It is a method of solving of specific class of problems - when an optimal solution of subproblem is guaranteed to be part of optimal solution of bigger problem. For instance, if you want to pay $567 with a smallest amount of bills, the solution will contain at least one of solutions for $1..$566 and one more bill.
The code is just an application of the algorithm.
This is dynamic programming because the findCandidate function is breaking down the provided array into smaller, more manageable parts. In this case, he starts with the first array as a candidate for the majority. By increasing the count when it is encountered and decreasing the count when it is not, he determines if this is true. When the count equals zero, we know that the first i characters do not have a majority. By continually calculating the local majority we don't need to iterate through the array more than once in the candidate identification phase. We then check to see if that candidate is actually the majority by going through the array a second time, giving us O(n). It actually runs in 2n time, since we iterate twice, but the constant doesn't matter.
In the book "The Algorithm Design Manual" by Skiena, computing the mode (most frequent element) of a set, is said to have a Ω(n log n) lower bound (this puzzles me), but also (correctly i guess) that no faster worst-case algorithm exists for computing the mode. I'm only puzzled by the lower bound being Ω(n log n).
See the page of the book on Google Books
But surely this could in some cases be computed in linear time (best case), e.g. by Java code like below (finds the most frequent character in a string), the "trick" being to count occurences using a hashtable. This seems obvious.
So, what am I missing in my understanding of the problem?
EDIT: (Mystery solved) As StriplingWarrior points out, the lower bound holds if only comparisons are used, i.e. no indexing of memory, see also: http://en.wikipedia.org/wiki/Element_distinctness_problem
// Linear time
char computeMode(String input) {
// initialize currentMode to first char
char[] chars = input.toCharArray();
char currentMode = chars[0];
int currentModeCount = 0;
HashMap<Character, Integer> counts = new HashMap<Character, Integer>();
for(char character : chars) {
int count = putget(counts, character); // occurences so far
// test whether character should be the new currentMode
if(count > currentModeCount) {
currentMode = character;
currentModeCount = count; // also save the count
}
}
return currentMode;
}
// Constant time
int putget(HashMap<Character, Integer> map, char character) {
if(!map.containsKey(character)) {
// if character not seen before, initialize to zero
map.put(character, 0);
}
// increment
int newValue = map.get(character) + 1;
map.put(character, newValue);
return newValue;
}
The author seems to be basing his logic on the assumption that comparison is the only operation available to you. Using a Hash-based data structure sort of gets around this by reducing the likelihood of needing to do comparisons in most cases to the point where you can basically do this in constant time.
However, if the numbers were hand-picked to always produce hash collisions, you would end up effectively turning your hash set into a list, which would make your algorithm into O(n²). As the author points out, simply sorting the values into a list first provides the best guaranteed algorithm, even though in most cases a hash set would be preferable.
So, what am I missing in my understanding of the problem?
In many particular cases, an array or hash table suffices. In "the general case" it does not, because hash table access is not always constant time.
In order to guarantee constant time access, you must be able to guarantee that the number of keys that can possibly end up in each bin is bounded by some constant. For characters this is fairly easy, but if the set elements were, say, doubles or strings, it would not be (except in the purely academic sense that there are, e.g., a finite number of double values).
Hash table lookups are amortized constant time, i.e., in general, the overall cost of looking up n random keys is O(n). In the worst case, they can be linear. Therefore, while in general they could reduce the order of mode calculation to O(n), in the worst case it would increase the order of mode calculation to O(n^2).