Understanding the running time analysis from an exercise of CLRS - algorithm

Here's the problem I am looking for an answer for:
An array A[1...n] contains all the integers from 0 to n except one. It would be easy to determine the missing
integer in O(n) time by using an auxiliary array B[0...n] to record which numbers appear in A. In this
problem, however, we cannot access an entire integer in A with a single operation. The elements of A are
represented in binary, and the only operation we can use to access them is ”fetch the jth bit of A[i],” which
takes constant time.
Show that if we use only this operation, we can still determine the missing integer in O(n) time.
I have this approach in mind:
If I didn't have the above fetching restriction, I would have taken all the numbers and did an XOR of them together. Then XOR the result with all numbers from 1..n. And the result of this would be my answer.
In this problem similarly I can repeatedly XOR the bits of different numbers at a distance of log(n+1) bits with each other for all elements in array and then XOR them with the elements 1...n but the complexity comes out to be O(nlogn) in my opinion.
How to achieve the O(n) complexity?
Thanks

You can use a variation of radix sort:
sort numbers according to MSb (Most Significant bit)
You get two lists of sizes n/2, n/2-1. You can 'drop' the list with n/2 elements - the missing number is not there.
Repeat for the second MSb and so on.
At the end, the 'path' you have chosen (the bit with the smaller list for each bit) will represent the missing number.
Complexity is O(n + n/2 + ... + 2 + 1), and since n + n/2 + .. + 1 < 2n - this is O(n)
This answer assumes for simplicity that n=2^k for some integer k (this relaxation can be later dropped off, by doing s 'special' handle for the MSb).

You have n integers with range [0..n]. You can inspect every number's most significant bit and divide these number into to groups C(with MSB 0) and D(with MSB 1). Since you know the range is [0..n], you can calculate how many numbers with MSB 0 in this range, called S1, and how many number with MSB 1 in this range, called S2. If the size of C is not equal to S1, then you know the missing number has MSB 0. Otherwise, you know the missing number has MSB 1. Then, you can recursively solve this problem. Since each recursive call takes linear time and each recursive call except the first one can half the problem size, the total running time is linear.

Related

big-Theta running time in terms of input size N

while N-bit integer a > 1,
a = a / 2
I was thinking it is log(n) because each time you go through the while loop you are dividing a by two but my friend thinks its 2 logn(n).
Clearly your algorithm is in big-Theta(log(a)), where a is your number
But as far as I understand your problem, you want to know the asymptotic runtime depending on the amount of bits of your number
That's really difficult to say and depends on your number:
Let's say you have an n-bit integer and the Most significant bit is 1. You have to divide it n-times, to get a number smaller than 1.
Now let's look at a integer where only the least siginficant bit is 1 (so it equals the number 1 in decimal system). There you need just one division.
So i would say, it'll take n/2 in average which makes it big-Theta(n) where n is the amount of bits of your number. The worst-case is also in big-Theta(n) and the best-case is in big-Theta(1)
NOTE: Dividing a number by two in binary system has a similar effect as dividing a number by ten in decimal system
Dividing an integer by two can be efficiently implemented by taking a number in binary notation and shifting the bits. In the worst case, all the bits are set at you have to shift (n-1) bits for the first division, (n-2) bits for the second, etc. etc. until you shift 1 bit on the last iteration and find the number becomes equal to 1, at which point you stop. This means your algorithm must shift 1+2+...+(n-1) = n(n-1)/2 bits, making your algorithm O(n^2) in the number of bits of input.
A more efficient algorithm that will leave a with the same value is a = (a == 0 ? 0 : 1). This generates the same answer in linear time (equality checking is linear in the number of bits) and it works because your code will only leave a = 0 if a is originally zero; in all other cases, the highest-order bit ends up in the unit's place.

Algorithm to sort a list in Θ(n) time

The problem is to sort a list containing n distinct integers that range in value from 1 to kn inclusive where k is a fixed positive integer. Design an algorithm to solve the problem in Θ(n) time.
I don't just want an answer. An explanation would help, or if someone could get me pointed in the right direction.
I know that Θ(n) time means the algorithm time is directly proportional to the number of elements. Not sure where to go from there.
Easy for fixed k: Create an array of kn counters. Set them all to zero. Iterate through the array, increasing the counter i by one if an array element equals i. Use the array of counters to re-create the sorted array.
Obviously this is inefficient if k > log n.
The key is that the integers only range from 1 to kn, so their length is limited. This is a little tricky:
The common assumption when we say that a sorting algorithm is O(N) is that the number N fits into a constant number of machine words so that we can do math on numbers of that size in constant time. Following this assumption, kN also fits into a constant number of machine words, since k is a fixed positive integer. Your input is therefore O(N) words long, and each word is fixed number of bits, so your input is O(N) bits long.
Therefore, any algorithm that takes time proportional to the number of bits in the input is considered O(N).
There are actually lots of choices, but when this particular question is asked in this particular way, the person asking usually wants you to come up with a radix sort:
https://en.wikipedia.org/wiki/Radix_sort
The MSB-first radix sort just partitions the integers into 2^W buckets according to the values of their top W bits, and then partitions each bucket according to the next W bits, etc., until all the bits are processed.
The time taken for this is O(N*(word_size/W)), but as we said the word size is constant, and W is constant, so this is O(N).

Algorithm for product of n numbers

I have a question about running time of an algorithm that express the product of n numbers.. I think the best solution is divide and conquer which is based on halving the n element by 2 recursively and multiply 2 elements. The confusing part is the number of simple operations. In case of Divide and Conquer the complexity should be O(logn) So if we have 8 numbers to multiply we should end up with 3 basic steps E.g we have 8 numbers... we can halve 8 until we reach to 2 and start multiplying it.. (a1 a2 a3 a4 a5 a6 a7 a8) ... (a1*a2=b1) (a3*a4=b2) (a5*a6=b3) (a7*a8=b4) (b1*b2=c1) (b3*b4=c2) (c1*c2=final result)..However, in this result we need 7 simple multiplication. Can someone clarify this to me..?
Divide and conquer is for cases when you can divide your original set into multiple subsets which, after you've identified and created them, do not interact anymore (or only in a way that's neglectably cheap compared to the operation on each subset). In your case, you're violating the "subsets do not interact after identifying them" rule.
Also, O(x) does not mean the number of operations is less than x. It just means, that for any concrete data set of size x, there is a finite value d so that the number of operations needed is smaller than d*x. (My native language is german, i hope i didn't change the meaning when translating). So the fact that you need 8 operations on 8 data items does not, per se, mean the complexity is larger than O(log n).
If I understand your objective correctly, then the complexity should be O(n), as you can just multiply the n values in sequence. For n values, you need n-1 multiplications. No need for any divide and conquer.
Whichever way you choose to perform multiplication of n numbers, you will need to multiply all of them, making n-1 multiplications. Thus, the time complexity of this multiplication will always be O(n).
Here is how I can explain that the number of steps required to multiply n numbers by your approach is close to n, not log(n). To multiply n numbers you need to make n/2 multiplications first - first number with the second, third with the fourth and so on, till the n-1-th and n-th number. After that you have n/2 numbers and to multiply them all you need to make n/4 multiplications - first result of previous multiplication with the second, third with the fourth and so on. After that you have n/4 numbers, and you'll make n/8 multiplications. This process will end when you'll have only two numbers left - their multiplication will give you the result.
Now, let's count the total number of multiplications needed. On the first stage you've made n/2 multiplications, on the second - n/4 and so on, till the only one multiplication. You have the following sequence: n/2 + n/4 + n/8 + ... + 1. It was shown, that the sum of such sequence equals to n.

Bit cost of bit shift

I updated a question I asked before with this but as the original question was answered I'm guessing I should ask it seperately in a new question.
Take for example the simple multiplication algorithm. I see in numerous places the claim that this is a Log^2(N) operation. The given explanation is that this is due to it consisting of Log(N) additions of a Log(N) number.
The problem I have with this is although that is true it ignores the fact that each of those Log(N) numbers will be the result of a bit shift and by the end we will have bitshifted at least Log(N) times. As bitshift by 1 is a Log(N) operation the bitshifts considered alone give us Log^2(N) operations.
It therefore makes no sense to me when I see it further claimed that in practice multiplication doesn't in fact use Log^2(N) operations as various methods can reduce the number of required additions. Since the bitshifting alone gives us Log^2(N) I'm left confused as to how that claim can be true.
In fact any shift and add method would seem to have this bit cost irrespective of how many addition's there are.
Even if we use perfect minimal bit coding any Mbit by Nbit multiplication will result in an approx M+Nbit Number so M+N bits will have to have been shifted at least N times just to output/store/combine terms, meaning a minimum N^2 bit operations.
This seems to contradict the claimed number of operations for Toom-Cook etc so can someone please point out where my reasoning is flawed.
I think the way around this issue is the fact that you can do the operation
a + b << k
without having to perform any shifts at all. If you imagine what the addition would look like, it would look something like this:
bn b(n - 1) ... b(n-k) b(n-k-1) b0 0 ... 0 0 0 0
an a(n-1) ... ak a(k-1) ... a3 a2 a1 a0
In other words, the last k digits of the number will just be the last k digits of the number a, the middle digits will consist of the sum of a subset of b's digits and a subset of a's digits, and the leading digits can be formed by doing a ripple propagation of any carries up through the remaining digits of b. In other words, the total runtime will be proportional to the number of digits in a and b, plus the number of places to do the shift.
The real trick here is realizing that you can shift by k places without doing k individual shifts by one place over. Rather than shuffling everything down k times, you can just figure out where the bits are going to end up and write them there directly. In other words, the cost of a shift by k bits is not k times the cost of a shift by 1 bit. It's O(N + k), where N is the number of bits in the number.
Consequently, if you can implement multiplication in terms of some number of "add two numbers with a shift" operations, you will not necessarily have to do O((log n)2) bit operations. Each addition does O(log n + k) total bit operations, so if k is small (say, O(log n)) and you only do a small number of additions, then you can do better than O((log n)2) bit operations.
Hope this helps!

What's the optimal way to calculate which integer in a list is missing?

I was recently in an interview where they asked me technical questions. One was how you would calculate which number in a list of length n-1 was missing. The list contained every number from 1 to n, except i where 1 <= i <= n. The numbers were not in order. My solution was to add them all up, then subtract that from the calculation of the numbers from 1 to n, by adding 1 to n and multiplying by n/2 or (n-1)/2 as appropriate. But I got the sense that there was a better way to do it. What is the optimal solution?
Your answer is good enough, in my opinion.
But some people -- perhaps your interviewer is one of them -- are worried about overflow and such. In that case, use XOR instead of addition.
To obtain the XOR of the integers from 0 to n, just XOR together the array indices as you loop. Given the XOR of the integers from 0 to n, and the XOR of the array elements, you just XOR the two of those together to get the missing element.
P.S. The sum of the integers from 1 to n is always (n+1)*n/2
while iterating through the array to calculate the sum, you can check whether if a number is repeating.
Your method is absolutely fine. It is optimal in terms of both space and time. Overflow can be the only problem with it.
Another possible method could be using a hashSet. Create an initial hashSet having values 1->N. Now for each number you encounter in the list - delete that value from the hashSet. At the end, the value that remains in the hashSet is the missing value.
This method is O(N) in time and space complexity. Your method (barring overflow) was O(N) in time and O(1) complexity. The added 'n' factor for space is the cost for eliminating overflow.
Your solution is pretty much optimal with one change, as #Nemo points out the sum of integers from 1 to n is always (n+1) * n/2
It's also worth pointing out that your approach is multi-thread capable (and might suitable for very large values of N), split the array in to parts, then get the sum of each array part in a thread, then add those part sums. It depends what the overhead of threading is compared to adding numbers in an array.
If you're worried about overflows, and your values are always int32 (as most .Length values are including arrays) then just store the sum as a int64, the sum of all positive integer values (((long)int.MaxValue) +1L) * (int.MaxValue / 2) = 2305843007066210304 which is still able to fit within a int64 with a .MaxValue = 9223372036854775807.
The other answer as mentioned by others is to XOR each item and keep a running XOR, but then you need to work out a formula to get the expected total XOR in O(1) time.
Most likely the interviewer is looking to see if you realise there's an O(N) solution with O(1) memory (which your answer is), rather than sorting the array and being much slower for very large values of N.
To further improve your solution in code, would be to use a pointer to access the array rather than the index value (which if your code is C# will be a reasonable improvement).

Resources