I am confused when it comes to space complexity of an algorithm. In theory, it corresponds to extra stack space that an algorithm uses i.e. other than the input. However, I have problems pointing out, what exactly is meant by that.
If, for instance, I have a following brute force algorithm that checks whether there are no duplicates in the array, would that mean that it uses O(1) extra storage spaces, because it uses int j and int k?
public static void distinctBruteForce(int[] myArray) {
for (int j = 0; j < myArray.length; j++) {
for (int k = j + 1; k < myArray.length; k++) {
if (k != j && myArray[k] == myArray[j]) {
return;
}
}
}
}
Yes, according to your definition (which is correct), your algorithm uses constant, or O(1), auxilliary space: the loop indices, possibly some constant heap space needed to set up the function call itself, etc.
It is true that it could be argued that the loop indices are bit-logarithmic in the size of the input, but it is usually approximated as being constant.
According to the Wikipedia entry:
In computational complexity theory, DSPACE or SPACE is the computational resource describing the resource of memory space for a deterministic Turing machine. It represents the total amount of memory space that a "normal" physical computer would need to solve a given computational problem with a given algorithm
So, in a "normal" computer, the indices would be considered each to be 64 bits, or O(1).
would that mean that it uses O(1) extra storage spaces, because it uses int j and int k?
Yes.
Extra storage space means space used for something other then the input itself. And, just as time complexity works, if that extra space is not dependent (increases when input size is increased) on the size of the input size itself, then the space complexity would be O(1)
Yes, your algorithm is indeed O(1) storage space 1, since the auxillary space you use has a strict upper bound that is independent on the input.
(1) Assuming integers used for iteration are in restricted range, usually up to 2^32-1
Related
I'm having trouble understanding something. I'm not even sure if it's correct.
In Cracking the Code Interview, there is a section that asks you to determine the Big O for a number of functions. For the most part, they're predictable.
However, one of them throws me for a loop.
Apparently, this evaluates to O(ab):
void printUnorderedPairs(int[] arrayA, int[] arrayB) {
for (int i = 0; i < arrayA.length; i++){
for (int j = 0; j < arrayB.length; j++){
for (int k = 0; k < 100000; k++){
System.out.println(arrayA[i] + "," + arrayB[j]);
}
}
}
}
With the rational that
"100,000 units of work is still constant, so the runtime is O(ab).
I'm trying to see why this could make sense, but I just can't yet; naturally, I expected O(abc).
Yes, 100,000 is a constant and arrayA and arrayB are arrays, but we're taking the length of the arrays. At the time of running these for loops, won't array[x].length be a constant (assuming the size of the arrays don't change during their execution)?
So, is the book right? If so, I would really appreciate insight and intuition so I don't fall into the same trap in the future.
Thanks!
Time complexity is generally expressed as the number of required elementary operations on an input of size n, where elementary operations are assumed to take a constant amount of time on a given computer and change only by a constant factor when run on a different computer.
O(ab) is the complexity in the above case as arrayA and arrayB are of variable length and are fully dependent on the calling function , and 100000 is constant, which won't change by any external factors.
Complexity is the measure of Unknown
The arrays A and B have an unspecified length, and all you can do is to give an indication of the complexity that is a function of these two lengths. Nothing else is variable in the given code.
What the authors meant by constant is a value that is a fix, regardless of the input size, unlike the length of input arrays that might change. For instance, the printUnorderedPairs might be called with different arrays as parameter, and those arrays might have different sizes among them.
The point of Big-O is to examine how the calculation grows as the inputs grow. It's clear that it would double if A doubled, and likewise if B doubled. So linear in those two.
What might be confusing you is that you could easily replace the 100k with C, yet another linear input, but it happens it doesn't have the 100k as a variable, it's a constant.
A similar thing in Big-O problems is where you step through an array a fixed number of times. That doesn't change the Big-O. For example if you step through an array to find the max, that's O(n). Stepping through it twice to find the min and the max is... also O(n). And in fact it's the same as stepping through it once to find the min and max in a single sweep.
When estimating the time complexity of a certain algorithm, let's say the following in pseudo code:
for (int i=0; i<n; i++) ---> O(n)
//comparison? ---> ?
//substitution ---> ?
for (int i=0; i<n; i++) ---> O(n)
//some function which is not recursive
In this case the time complexity of these instructions is O(n) because we iterate over the input n, but how about the comparison and substitution operations are they constant time since they don't depend on n?
Thanks
Both of the other answers assume you are comparing some sort of fixed-size data type, such as 32-bit integers, doubles, or characters. If you are using operators like < in a language such as Java where they can only be used on fixed-size data types, and cannot be overloaded, then that is correct. But your question is not language-specific, and you also did not say you are comparing using such operators.
In general, the time complexity of a comparison operation depends on the data type you are comparing. It takes O(1) time to compare 64-bit integers, doubles, or characters, for example. But as a counter-example comparing strings in lexicographic order takes O(min(k, k')) time in the worst case, where k, k' are the lengths of the strings.
For example, here is the Java source code for the String.compareTo method in OpenJDK 7, which clearly does not take constant time:
public int compareTo(String anotherString) {
int len1 = value.length;
int len2 = anotherString.value.length;
int lim = Math.min(len1, len2);
char v1[] = value;
char v2[] = anotherString.value;
int k = 0;
while (k < lim) {
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
return c1 - c2;
}
k++;
}
return len1 - len2;
}
Therefore when analysing the time complexity of comparison-based sorting algorithms, we often analyse their complexity in terms of the number of comparisons and substitutions, rather than the number of basic operations; for example, selection sort does O(n) substitutions and O(n²) comparisons, whereas merge sort does O(n log n) substitutions and O(n log n) comparisons.
First, read this book. Here is good explanation of this topic.
Comparsion. For instance, we have two variables a and b. And when we doing this a==bwe just take a and b from the memory and compare them. Let's define "c" as cost of memory, and "t" as cost of time. In this case we're using 2c (because we're using two cells of the memory) and 1t (because there is only one operation with the constant cost), therefore the 1t - is the constan. Thus the time complexity is constant.
Substitution. It's pretty same as the previous operation. We're using two variables and one operation. This operation is same for the any type, therefore the cost of the time of the substitutin is constant. Then complexity is constant too.
but how about the comparison and substitution operations are they
constant time since they don't depend on n?
Yes. Comparison and substitution operations are a constant factor because their execution time doesn't depend on a size of the input. Their execution time takes time but, again, it's independent from the input size.
However, the execution time of your for loop grows proportionally to the number of items n and so its time complexity is O(n).
UPDATE
As #kaya3 correctly pointed out, we assume that we deal with fixed-size data types. If they're not then check an answer from #kaya3.
Example1: Given an input of array A with n elements.
See the algo below:
Algo(A, I, n)
{
int i, j = 100;
for (i = 1 to j)
A[i] = 0;
}
Space complexity = Extra space required by variable i + variable 'j'
In this case my space complexity is: O(1) => constant
Example2: Array of size n given as input
A(A,I,n)
{
int i;
create B[n]; //create a new array B with same number of elements
for(i = 1 to n)
B[i] = A[i]
}
Space complexity in this case: Extra space taken by i + new Array B
=> 1 + n => O(n)
Even if I used 5 variables here space complexity will still be O(n).
If as per computer science my space complexity is always constant for first and O(n) for second even if I was using 10 variables in the above algo, why is it always advised to make programs using less number of variables?
I do understand that in practical scenarios it makes the code more readable and easier to debug etc.
But looking for an answer in terms of space complexity only here.
Big O complexity is not the be-all end-all consideration in analysis of performance. It is all about the constants that you are dropping when you look at asymptotic (big O) complexity. Two algorithms can have the same big-O complexity and yet one can be thousands of times more expensive than the other.
E.g. if one approach to solving some problem always takes 10s flat, and another approach takes 3000s flat, regardless of input size, they both have O(1) time complexity. Of course, that doesn't really mean both are equally good; using the latter approach if there is no real benefit is simply a massive waste of time.
This is not to say performance is the only, or even the primary consideration when someone advises you to be economical with your use of local variables. Other considerations like readability, or avoidance of subtle bugs are also factors.
For this code snippet
Algo(A, I, n)
{
int i, j = 100;
for (i = 1 to j)
A[i] = 0;
}
Space Complexity is: O(1) for the array and constant space for the two variables i and j
It is always advised to use less variables because ,each variable occupies constant space ,if you have 'k' variables.k variables will use k*constant space ,if lets consider each variable is of type int so int occupies 2 bytes so k*2bytes,lets take k as 10 so it 20bytes here
It is as similar as using int A[10] =>20 bytes space complexity
I hope you understand
I'm reading Cracking the Coding Interview, and in the Big O section of the book we are posed with this piece of code:
int pairSumSequence(int n){
int sum = 0;
for(int i = 0; i < n; i++){
sum += pairSum(i, i + 1);
}
return sum;
}
int pairSum(int a, int b) {
return a + b;
}
Now, I understand that the time complexity is of O(n), because it's clearly obvious that the time to execute increases as the size of the int n increases. But the book continues to state that:
"...However, those calls [referring to calls of pairSum] do not exist simultaneously on the call
stack, so you only need O(1) space."
This is the part I do not understand. Why is the space complexity of this algorithm O(1)? What exactly does the author mean by this? At my initial analysis, I assumed that because pairSum() is being called N times those calls would be added to the call stack back-to-back and would thus occupy N space in the call stack. Thanks very much.
It means that the amount of space used by this algorithm is constant with respect to the input size.
Yes, the pairSum is called N times. However, it occupies O(1) space because, as the book says, no two calls are done simultaneously.
Roughly speaking, at each iteration of the loop:
The pairSum is called. It uses a constant amount of the stack space.
It returns. It doesn't occupy any stack space after that.
Thus, this algorithm uses only a fixed amount of space at any point (it doesn't depend on n).
Consider the following C-function:
double foo (int n) {
int i;
double sum;
if (n==0)
return 1.0;
else {
sum = 0.0;
for (i=0; i<n; i++)
sum +=foo(i);
return sum;
}
}
The space complexity of the above function is:
(a) O(1) (b) O(n) (c) O(n!) (d) O(n^n)
What I've done is calculating the recurrence relation for the above code but I'm still not able to solve that recurrence. I know this is not home work related site. But any help would be appreciated.
This is my recurrence.
T(n) = T(n-1) + T(n-2) + T(n-3) + T(n-4) +........+ T(1)+ S
Where S is some constant.
That would depend on whether you're talking about stack, or heap space complexity.
For the heap, it's O(1) or O(0) since you're using no heap memory. (aside from the basic system/program overhead)
For the stack, it's O(n). This is because the recursion gets up the N levels deep.
The deepest point is:
foo(n)
foo(n - 1)
foo(n - 2)
...
foo(0)
Space complexity describes how much space your program needs. Since foo does not declare arrays, each level requires O(1) space. Now all you need to do is to figure out how many nested levels can be active at the most at any given time.
Edit: ...so much for letting you figure out the solution for yourself :)
You don't explain how you derived your recurrence relation. I would do it like this:
If n == 0, then foo uses constant space (there is no recursion).
If n > 1, then foo recurses once for each i from 0 to n-1 (inclusive). For each recursion, it uses constant space (for the call itself) plus T(i) space for the recursive call. But these calls occur one after the other; the space used by each call is releasing before the next call. Therefore they should not be added, but simply the maximum taken. That would be T(n-1), since T is non-decreasing.
The space cmplexity would be O(n). As you have mentioned, it might seem like O(n*n), but one should remember that onces the call for say (i=1) in the loop is done, the space used up in the stack for this is removed. So, you will have to consider the worst case, when i=n-1. Then the maximum number of recursive function calls will be on the stack simultaneously