Calculating space complexity - complexity-theory

I was given this sample of code and im not sure what the space complexity will be here.
void f1(int n){
int b = 2;
while (n >= 1){
b *= b;
n /= 2;
}
free(malloc(b));
}
As far as the loop, it runs log(n) times, but the variable b is increased exponentially.
Thats why im not sure whats going on.
Thanks for any help regarding this :)

First of all if you run that as it is then you would get a memory overflow since most numeric datatypes have limited memory (e.g. 32 bits). Therefore I will assume that you run that algorithm on a Turing machine which is an idealized computer model. In that case n will be much smaller then the last value of b. In addition, I assume that your goal is to compute the last value of b, so you need to write the value of b on the memory at the end (say in binary). Keeping these in mind you only need to use enough space record all values of b on the memory (tape). You wouldn't need much more memory than to record the last value b takes. Therefore your space complexity is $O(2^{2^{log_2(n)}})$ which is simply O(2^n)

Related

Does O(1) mean an algorithm takes one step to execute a required task?

I thought it meant it takes a constant amount of time to run. Is that different than one step?
O(1) is a class of functions. Namely, it includes functions bounded by a constant.
We say that an algorithm has the complexity of O(1) iff the amount of steps it takes, as a function of the size of the input, is bounded by a(n arbirtary) constant. This function can be a constant, or it can grow, or behave chaotically, or undulate as a sine wave. As long as it never exceeds some predefined constant, it's O(1).
For more information, see Big O notation.
It means that even if you increase the size of whatever the algorithm is operating on, the number of calculations required to run remains the same.
More specifically it means that the number of calculations doesn't get larger than some constant no matter how big the input gets.
In contrast, O(N) means that if the size of the input is N, the number of steps required is at most a constant times N, no matter how big N gets.
So for example (in python code since that's probably easy for most to interpret):
def f(L, index): #L a list, index an integer
x = L[index]
y=2*L[index]
return x + y
then even though f has several calculations within it, the time taken to run is the same regardless of how long the list L is. However,
def g(L): #L a list
return sum(L)
This will be O(N) where N is the length of list L. Even though there is only a single calculation written, the system has to add all N entries together. So it has to do at least one step for each entry. So as N increases, the number of steps increases proportional to N.
As everyone has already tried to answer it, it simply means..
No matter how many mangoes you've got in a box, it'll always take you the same amount of time to eat 1 mango. How you plan on eating it is irrelevant, there maybe a single step or you might go through multiple steps and slice it nicely to consume it.

Runtime of following algorithm (example from cracking the coding interview)

One of problem in cracking the coding interview book asks the run-time for following algorithm, which prints the powers of 2 from 1 through n inclusive:
int powersOf2(int n) {
if (n < 1) return 0;
else if (n == 1) print(1); return 1;
else
{
int prev = powersOf2(n/2);
int curr = prev * 2;
print(curr);
return curr;
}
}
The author answers that it runs in O(log n).
It makes perfect sense, but... n is the VALUE of the input! (pseudo-sublinear run-time).
Isn't it more correct to say that the run-time is O(m) where m is the length of input to the algorithm? (O(log(2^m)) = O(m)).
Or is it perfectly fine to simply say it runs in O(log n) without mentioning anything about pseudo- runtimes...
I am preparing for an interview, and wondering whether I need to mention that the run-time is pseudo-sublinear for questions like this that depend on value of an input.
I think the term that you're looking for here is "weakly polynomial," meaning "polynomial in the number of bits in the input, but still dependent on the numeric value of the input."
Is this something you need to mention in an interview? Probably not. Analyzing the algorithm and saying that the runtime is O(log n) describes the runtime perfectly as a function of the input parameter n. Taking things a step further and then looking at how many bits are required to write out the number n, then mentioning that the runtime is linear in the size of the input, is a nice flourish and might make an interviewer happy.
I'd actually be upset if an interviewer held it against you if you didn't mention this - this is the sort of thing you'd only know if you had a good university education or did a lot of self-studying.
When you say that an algorithm takes O(N) time, and it's not specified what N is, then it's taken to be the size of the input.
In this case, however, the algorithm is said to to take O(n) time, where n identifies a specific input parameter. That is also perfectly OK, and is common when the size of the input isn't what you would practically want to measure against. You will also see complexity given in terms of multiple parameters, like O(|V|+|E]) for graph algorithms, etc.
To make things a bit more confusing, the input value n is a single machine word, and numbers that fit into 1 or 2 machine words are usually considered to be constant size, because in practice they are.
Since giving a complexity in terms of the size of n is therefore not useful in any way, if you were asked to give a complexity without any specific instructions of how to measure the input size, you would measure it in terms of the value of n, because that is the useful way to do it.

Time complexity for an infinte loop

I have understood very little of computing complexities, but can you actually calculate the run-time complexity of a function running infinitely, for eg:
for(i = 0;i < 10;i *= 2)
{
[Algo / Lines of code]
}
Can you help me out?
The run-time complexity depends on a parameter n, e.g. the size of a data set to be processed, and considers how the runtime changes when n changes, but independent of any "technical" constraints, e.g. the speed of the CPU.
Since you algo does not depend on any parameter, but its runtime is always infinite, one cannot determine a run rime complexity for it.

How to efficiently compute average on the fly (moving average)?

I come up with this
n=1;
curAvg = 0;
loop{
curAvg = curAvg + (newNum - curAvg)/n;
n++;
}
I think highlights of this way are:
- It avoids big numbers (and possible overflow if you would sum and then divide)
- you save one register (not need to store sum)
The trouble might be with summing error - but I assume that generally there shall be balanced numbers of round up and round down so the error shall not sum up dramatically.
Do you see any pitfalls in this solution?
Have you any better proposal?
Your solution is essentially the "standard" optimal online solution for keeping a running track of average without storing big sums and also while running "online", i.e. you can just process one number at a time without going back to other numbers, and you only use a constant amount of extra memory. If you want a slightly optimized solution in terms of numerical accuracy, at the cost of being "online", then assuming your numbers are all non-negative, then sort your numbers first from smallest to largest and then process them in that order, the same way you do now. That way, if you get a bunch of numbers that are really small about equal and then you get one big number, you will be able to compute the average accurately without underflow, as opposed to if you processed the large number first.
I have used this algorithm for many years.
The loop is any kind of loop. Maybe it is individual web sessions or maybe true loop. The point is all you need to track is the current count (N) and the current average (avg). Each time a new value is received, apply this algorithm to update the average. This will compute the exact arithmetic average. It has the additional benefit that it is resistant to overflow. If you have a gazillion large numbers to average, summing them all up may overflow before you get to divide by N. This algorithm avoids that pitfall.
Variables that are stored during the computation of the average:
N = 0
avg = 0
For each new value: V
N=N+1
a = 1/N
b = 1 - a
avg = a * V + b * avg

Programming problem - Game of Blocks

maybe you would have an idea on how to solve the following problem.
John decided to buy his son Johnny some mathematical toys. One of his most favorite toy is blocks of different colors. John has decided to buy blocks of C different colors. For each color he will buy googol (10^100) blocks. All blocks of same color are of same length. But blocks of different color may vary in length.
Jhonny has decided to use these blocks to make a large 1 x n block. He wonders how many ways he can do this. Two ways are considered different if there is a position where the color differs. The example shows a red block of size 5, blue block of size 3 and green block of size 3. It shows there are 12 ways of making a large block of length 11.
Each test case starts with an integer 1 ≤ C ≤ 100. Next line consists c integers. ith integer 1 ≤ leni ≤ 750 denotes length of ith color. Next line is positive integer N ≤ 10^15.
This problem should be solved in 20 seconds for T <= 25 test cases. The answer should be calculated MOD 100000007 (prime number).
It can be deduced to matrix exponentiation problem, which can be solved relatively efficiently in O(N^2.376*log(max(leni))) using Coppersmith-Winograd algorithm and fast exponentiation. But it seems that a more efficient algorithm is required, as Coppersmith-Winograd implies a large constant factor. Do you have any other ideas? It can possibly be a Number Theory or Divide and Conquer problem
Firstly note the number of blocks of each colour you have is a complete red herring, since 10^100 > N always. So the number of blocks of each colour is practically infinite.
Now notice that at each position, p (if there is a valid configuration, that leaves no spaces, etc.) There must block of a color, c. There are len[c] ways for this block to lie, so that it still lies over this position, p.
My idea is to try all possible colors and positions at a fixed position (N/2 since it halves the range), and then for each case, there are b cells before this fixed coloured block and a after this fixed colour block. So if we define a function ways(i) that returns the number of ways to tile i cells (with ways(0)=1). Then the number of ways to tile a number of cells with a fixed colour block at a position is ways(b)*ways(a). Adding up all possible configurations yields the answer for ways(i).
Now I chose the fixed position to be N/2 since that halves the range and you can halve a range at most ceil(log(N)) times. Now since you are moving a block about N/2 you will have to calculate from N/2-750 to N/2-750, where 750 is the max length a block can have. So you will have to calculate about 750*ceil(log(N)) (a bit more because of the variance) lengths to get the final answer.
So in order to get good performance you have to through in memoisation, since this inherently a recursive algorithm.
So using Python(since I was lazy and didn't want to write a big number class):
T = int(raw_input())
for case in xrange(T):
#read in the data
C = int(raw_input())
lengths = map(int, raw_input().split())
minlength = min(lengths)
n = int(raw_input())
#setup memoisation, note all lengths less than the minimum length are
#set to 0 as the algorithm needs this
memoise = {}
memoise[0] = 1
for length in xrange(1, minlength):
memoise[length] = 0
def solve(n):
global memoise
if n in memoise:
return memoise[n]
ans = 0
for i in xrange(C):
if lengths[i] > n:
continue
if lengths[i] == n:
ans += 1
ans %= 100000007
continue
for j in xrange(0, lengths[i]):
b = n/2-lengths[i]+j
a = n-(n/2+j)
if b < 0 or a < 0:
continue
ans += solve(b)*solve(a)
ans %= 100000007
memoise[n] = ans
return memoise[n]
solve(n)
print "Case %d: %d" % (case+1, memoise[n])
Note I haven't exhaustively tested this, but I'm quite sure it will meet the 20 second time limit, if you translated this algorithm to C++ or somesuch.
EDIT: Running a test with N = 10^15 and a block with length 750 I get that memoise contains about 60000 elements which means non-lookup bit of solve(n) is called about the same number of time.
A word of caution: In the case c=2, len1=1, len2=2, the answer will be the N'th Fibonacci number, and the Fibonacci numbers grow (approximately) exponentially with a growth factor of the golden ratio, phi ~ 1.61803399. For the
huge value N=10^15, the answer will be about phi^(10^15), an enormous number. The answer will have storage
requirements on the order of (ln(phi^(10^15))/ln(2)) / (8 * 2^40) ~ 79 terabytes. Since you can't even access 79
terabytes in 20 seconds, it's unlikely you can meet the speed requirements in this special case.
Your best hope occurs when C is not too large, and leni is large for all i. In such cases, the answer will
still grow exponentially with N, but the growth factor may be much smaller.
I recommend that you first construct the integer matrix M which will compute the (i+1,..., i+k)
terms in your sequence based on the (i, ..., i+k-1) terms. (only row k+1 of this matrix is interesting).
Compute the first k entries "by hand", then calculate M^(10^15) based on the repeated squaring
trick, and apply it to terms (0...k-1).
The (integer) entries of the matrix will grow exponentially, perhaps too fast to handle. If this is the case, do the
very same calculation, but modulo p, for several moderate-sized prime numbers p. This will allow you to obtain
your answer modulo p, for various p, without using a matrix of bigints. After using enough primes so that you know their product
is larger than your answer, you can use the so-called "Chinese remainder theorem" to recover
your answer from your mod-p answers.
I'd like to build on the earlier #JPvdMerwe solution with some improvements. In his answer, #JPvdMerwe uses a Dynamic Programming / memoisation approach, which I agree is the way to go on this problem. Dividing the problem recursively into two smaller problems and remembering previously computed results is quite efficient.
I'd like to suggest several improvements that would speed things up even further:
Instead of going over all the ways the block in the middle can be positioned, you only need to go over the first half, and multiply the solution by 2. This is because the second half of the cases are symmetrical. For odd-length blocks you would still need to take the centered position as a seperate case.
In general, iterative implementations can be several magnitudes faster than recursive ones. This is because a recursive implementation incurs bookkeeping overhead for each function call. It can be a challenge to convert a solution to its iterative cousin, but it is usually possible. The #JPvdMerwe solution can be made iterative by using a stack to store intermediate values.
Modulo operations are expensive, as are multiplications to a lesser extent. The number of multiplications and modulos can be decreased by approximately a factor C=100 by switching the color-loop with the position-loop. This allows you to add the return values of several calls to solve() before doing a multiplication and modulo.
A good way to test the performance of a solution is with a pathological case. The following could be especially daunting: length 10^15, C=100, prime block sizes.
Hope this helps.
In the above answer
ans += 1
ans %= 100000007
could be much faster without general modulo :
ans += 1
if ans == 100000007 then ans = 0
Please see TopCoder thread for a solution. No one was close enough to find the answer in this thread.

Resources