Google interview, can somebody confirm this bit string algorithm, pure hypothesis - algorithm

This was a Google question i didn't figure out was right or wrong but a 2nd opinion never hurt. but the question is "given an n length bit-string solve for the number of times "111" appeared in all possible combinations."
now i know to find total combinations is 2^n what took me trouble was figuring out the number of occurrences i did find a pattern in occurrences but who knows for sure what happens when n becomes vast.
My logical solution was
#Level (n length) # combos # strings with "111" in it
_________________ ________ _____________________________
0 0 0
1("1" or "0") 2 0
2("11","01" etc. 4 0
3 8 1("111")
4 16 3
5 32 8
6 64 20
------------------------------------Everything before this is confirmed
7 128 49
8 256 119
9 512 288
10 1000 696
etc.. i can post how i came up with the magical fairy dust but yeah

I can help you with a solution:
Call the function to calculate number of string with n bit contains 111 is f(n)
If the first bit of the string is 0, we have f(n) += f(n - 1)//0 + (n - 1 bits)
If the first bit of the string is 1, we have f(n) += f(n - 2) + f(n - 3) + 2^(n - 3)
More explanation for case first bit is 1
If the first bit is 1, we have three cases:
10 + (n - 2 bits) = f(n - 2)
110 + (n - 3 bits) = f(n - 3)
111 + (n - 3 bits) = 2^(n - 3) as we can take all combinations.
So in total f(n) = f(n - 1) + f(n - 2) + f(n - 3) + 2^(n - 3).
Apply to our example:
n = 4 -> f(4) = f(3) + f(2) + f(1) + 2^1 = 1 + 0 + 0 + 2 = 3;
n = 5 -> f(5) = f(4) + f(3) + f(2) + 2^2 = 3 + 1 + 0 + 4 = 8;
n = 6 -> f(6) = f(5) + f(4) + f(3) + 2^3 = 8 + 3 + 1 + 8 = 20;
n = 7 -> f(7) = f(6) + f(5) + f(4) + 2^4 = 20 + 8 + 3 + 16 = 47;
n = 8 -> f(8) = f(7) + f(6) + f(5) + 2^5 = 47 + 20 + 8 + 32 = 107;
n = 9 -> f(9) = f(8) + f(7) + f(6) + 2^6 = 107 + 47 + 20 + 64 = 238;

Related

It is possible to get the index of a combination without generating it?

I mean a function that accepts an array of elements and a combination as params, and returns a number that represents the index of a combination without generating every combination.
I have no preference, it can be in any programming language.
An example of code getCombinationIndex("114") and should return the index of combination 114.
[1,1,1]: 1
[2,1,1]: 2
[3,1,1]: 3
[4,1,1]: 4
[.....]
[1,1,4]: ?
Let's say you are considering combinations of k symbols from alphabet A = {a_0, a_1, ..., a_n} (i.e. with n symbols and a_i < a_j lexicographically if i < j). In your example, you have an alphabet of 4 symbols A = {1, 2, 3, 4} and combinations of k = 3 symbols.
Then, a combination c = [a_i1, a_i2, ..., a_ik] can be uniquely encoded as I(c) = i1 + n*i2 + (n^2)*i3 + ... + (n^(k-1))*ik. The indexing you're looking for is F(c) = I(c) + 1.
Let's see how it works for your example:
F([1,1,1]) = I([1,1,1]) + 1 = 0 + 4*0 + (4^2)*0 + 1 = 1
F([2,1,1]) = I([2,1,1]) + 1 = 1 + 4*0 + (4^2)*0 + 1 = 2
F([3,1,1]) = I([2,1,1]) + 1 = 2 + 4*0 + (4^2)*0 + 1 = 3
F([4,1,1]) = I([2,1,1]) + 1 = 3 + 4*0 + (4^2)*0 + 1 = 4
...
F([2,1,3]) = I([2,2,3]) + 1 = 1 + 4*1 + (4^2)*2 + 1 = 38
...
F([1,1,4]) = I([1,1,4]) + 1 = 0 + 4*0 + (4^2)*3 + 1 = 49
...
F([4,4,4]) = I([4,4,4]) + 1 = 3 + 4*3 + (4^2)*3 + 1 = 64
This problem can be seen as base conversion. You need two informations to start with and then it will be only a base conversion.
The base
In your case this is the highest number of all the items.
[4,1,1] -> 4
The desired combination
This only works for the premiss that all items can have the same maximum.
Algorithm
Reverse the order of items
Decrement every item by 1
Convert the number to base 10
Increment by 1
Example
Start: 114
Reverse: 411
Decrement: 300
Conversion:
Base 4: 300
Base 10: 3*4^2 + 0*4^1 + 0*4^0 = 24
Increment: 25

The sum from 1 to n in theta(log n)

Is there anyway to calculate the sum of 1 to n in Theta(log n)?
Of course, the obvious way to do it is sum = n*(n+1)/2.
However, for practicing, I want to calculate in Theta(log n).
For example,
sum=0; for(int i=1; i<=n; i++) { sum += i}
this code will calculate in Theta(n).
Fair way (without using math formulas) assumes direct summing all n values, so there is no way to avoid O(n) behavior.
If you want to make some artificial approach to provide exactly O(log(N)) time, consider, for example, using powers of two (knowing that Sum(1..2^k = 2^(k-1) + 2^(2*k-1) - for example, Sum(8) = 4 + 32). Pseudocode:
function Sum(n)
if n < 2
return n
p = 1 //2^(k-1)
p2 = 2 //2^(2*k-1)
while p * 4 < n:
p = p * 2;
p2 = p2 * 4;
return p + p2 + ///sum of 1..2^k
2 * p * (n - 2 * p) + ///(n - 2 * p) summands over 2^k include 2^k
Sum(n - 2 * p) ///sum of the rest over 2^k
Here 2*p = 2^k is the largest power of two not exceeding N. Example:
Sum(7) = Sum(4) + 5 + 6 + 7 =
Sum(4) + (4 + 1) + (4 + 2) + (4 + 3) =
Sum(4) + 3 * 4 + Sum(3) =
Sum(4) + 3 * 4 + Sum(2) + 1 * 2 + Sum(1) =
Sum(4) + 3 * 4 + Sum(2) + 1 * 2 + Sum(1) =
2 + 8 + 12 + 1 + 2 + 2 + 1 = 28

How to write a function f("123")=123+12+23+1+2+3 as a recurrence relationship

I'm wondering if someone can help me try to figure this out.
I want f(str) to take a string str of digits and return the sum of all substrings as numbers, and I want to write f as a function of itself so that I can try to solve this with memoization.
It's not jumping out at me as I stare at
Solve("1") = 1
Solve("2") = 2
Solve("12") = 12 + 1 + 2
Solve("29") = 29 + 2 + 9
Solve("129") = 129 + 12 + 29 + 1 + 2 + 9
Solve("293") = 293 + 29 + 93 + 2 + 9 + 3
Solve("1293") = 1293 + 129 + 293 + 12 + 29 + 93 + 1 + 2 + 9 + 3
Solve("2395") = 2395 + 239 + 395 + 23 + 39 + 95 + 2 + 3 + 9 + 5
Solve("12395") = 12395 + 1239 + 2395 + 123 + 239 + 395 + 12 + 23 + 39 + 95 + 1 + 2 + 3 + 9 + 5
You have to break f down into two functions.
Let N[i] be the ith digit of the input. Let T[i] be the sum of substrings of the first i-1 characters of the input. Let B[i] be the sum of suffixes of the first i characters of the input.
So if the input is "12395", then B[3] = 9+39+239+1239, and T[3] = 123+12+23+1+2+3.
The recurrence relations are:
T[0] = B[0] = 0
T[i+1] = T[i] + B[i]
B[i+1] = B[i]*10 + (i+1)*N[i]
The last line needs some explanation: the suffixes of the first i+2 characters are the suffixes of the first i+1 characters with the N[i] appended on the end, as well as the single-character string N[i]. The sum of these is (B[i]*10+N[i]*i) + N[i] which is the same as B[i]*10+N[i]*(i+1).
Also f(N) = T[len(N)] + B[len(N)].
This gives a short, linear-time, iterative solution, which you could consider to be a dynamic program:
def solve(n):
rt, rb = 0, 0
for i in xrange(len(n)):
rt, rb = rt+rb, rb*10+(i+1)*int(n[i])
return rt+rb
print solve("12395")
One way to look at this problem is to consider the contribution of each digit to the final sum.
For example, consider the digit Di at position i (from the end) in the number xn-1…xi+1Diyi-1…y0. (I used x, D, and y just to be able to talk about the digit positions.) If we look at all the substrings which contain D and sort them by the position of D from the end of the number, we'll see the following:
D
xD
xxD
…
xx…xD
Dy
xDy
xxDy
…
xx…xDy
Dyy
xDyy
xxDyy
…
xx…xDyy
and so on.
In other words, D appears in every position from 0 to i once for each prefix length from 0 to n-i-1 (inclusive), or a total of n-i times in each digit position. That means that its total contribution to the sum is D*(n-i) times the sum of the powers of 10 from 100 to 10i. (As it happens, that sum is exactly (10i+1−1)⁄9.)
That leads to a slightly simpler version of the iteration proposed by Paul Hankin:
def solve(n):
ones = 0
accum = 0
for m in range(len(n),0,-1):
ones = 10 * ones + 1
accum += m * ones * int(n[m-1])
return accum
By rearranging the sums in a different way, you can come up with this simple recursion, if you really really want a recursive solution:
# Find the sum of the digits in a number represented as a string
digitSum = lambda n: sum(map(int, n))
# Recursive solution by summing suffixes:
solve2 = lambda n: solve2(n[1:]) + (10 * int(n) - digitSum(n))/9 if n else 0
In case it's not obvious, 10*n-digitSum(n) is always divisible by 9, because:
10*n == n + 9*n == (mod 9) n mod 9 + 0
digitSum(n) mod 9 == n mod 9. (Because 10k == 1 mod n for any k.)
Therefore (10*n - digitSum(n)) mod 9 == (n - n) mod 9 == 0.
Looking at this pattern:
Solve("1") = f("1") = 1
Solve("12") = f("12") = 1 + 2 + 12 = f("1") + 2 + 12
Solve("123") = f("123") = 1 + 2 + 12 + 3 + 23 + 123 = f("12") + 3 + 23 + 123
Solve("1239") = f("1239") = 1 + 2 + 12 + 3 + 23 + 123 + 9 + 39 + 239 + 1239 = f("123") + 9 + 39 + 239 + 1239
Solve("12395") = f("12395") = 1 + 2 + 12 + 3 + 23 + 123 + 9 + 39 + 239 + 1239 + 5 + 95 + 395 + 2395 + 12395 = f("1239") + 5 + 95 + 395 + 2395 + 12395
To get the new terms, with n being the length of str, you are including the substrings made up of the 0-based index ranges of characters in str: (n-1,n-1), (n-2,n-1), (n-3,n-1), ... (n-n, n-1).
You can write a function to get the sum of the integers formed from the substring index ranges. Calling that function g(str), you can write the function recursively as f(str) = f(str.substring(0, str.length - 1)) + g(str) when str.length > 1, and the base case with str.length == 1 would just return the integer value of str. (The parameters of substring are the start index of a character in str and the length of the resulting substring.)
For the example Solve("12395"), the recursive equation f(str) = f(str.substring(0, str.length - 1)) + g(str) yields:
f("12395") =
f("1239") + g("12395") =
(f("123") + g("1239")) + g("12395") =
((f("12") + g("123")) + g("1239")) + g("12395") =
(((f("1") + g("12")) + g("123")) + g("1239")) + g("12395") =
1 + (2 + 12) + (3 + 23 + 123) + (9 + 39 + 239 + 1239) + (5 + 95 + 395 + 2395 + 12395)

Hamming weight of an interval

My task is to calculate number of 1-bits of interval [1,10^16]. Loop is obviously unusable for this case, and I've heard there exists an algorithm for this. Can anyone help?
More generally, an algorithm for number of 1-bits in an interval [1,n] would be nice.
If that helps, I figured that number of 1-bits of interval [1,2^n-1], n positive integer, is n*2^(n-1).
The number of 1-bits in interval [1,n] is the number of 1-bits in interval [1,2^m] plus the number of 1-bits in interval [1,n-2^m] plus n - 2^m.
m is ⌊log(n)/log2⌋.
Let f(A,B) = the number of 1-bits from A to B, including A and B.
I figured that too : f(1,2^k-1) = k*2^(n-1)
Obviously, f(1, x) = f(0, x) since 0 has no 1-bit.
Let's x = 2^k + b, f(1, x) = f(0, x) = f(0, 2^k + b) = f(0, 2^k - 1) + f(2^k, 2^k + b)
The key problem is f(2^k, 2^k + b)
2^k = 1 0 0 0 ... 0 0
2^k + 1 = 1 0 0 0 ... 0 1
2^k + 2 = 1 0 0 0 ... 1 0
2^k + 3 = 1 0 0 0 ... 0 1
... ...
2^k + b = 1 0 0 0 ... ? ?
Clearly, there is 1-bits in the first bit of each number from 2^k to 2^k + b. And there is (b+1) integers from 2^k to 2^k + b.
We can remove the first 1-bit. And it becomes below.
0 = 0 0 0 0 ... 0 0
1 = 0 0 0 0 ... 0 1
2 = 0 0 0 0 ... 1 0
3 = 0 0 0 0 ... 0 1
... ...
b = 0 0 0 0 ... ? ?
So, f(2^k, 2^k + b) = (b+1) + f(0, b).
f(0, x) = f(0, 2^k - 1) + f(2^k, 2^k + b)
= f(0, 2^k - 1) + (b+1) + f(0, b)
Clearly, we have to recursively calculate f(0,b).
Give an example of step 4.
For f(1, 31) = 80, and 31 has 5 1-bits.
so f(1, 30) = 80 - 5 = 75;
Let's calculate f(1, 30) use the method of step 4.
f(1, 30) = f(0, 30)
= f(0, 15) + f(16, 30)
= 32 + 15 + f(0, 14)
= 47 + f(0, 14)
= 47 + f(0, 7) + f(8, 14)
= 47 + 12 + 7 + f(0, 6)
= 66 + f(0, 6)
= 66 + f(0, 3) + f(4, 6)
= 66 + 4 + 3 + f(0, 2)
= 73 + f(0, 2)
= 73 + f(0, 1) + f(2, 2)
= 74 + f(2, 2)
= 74 + 1 + f(0, 0)
= 75

Recurrences using Substitution Method

Determine the positive number c & n0 for the following recurrences (Using Substitution Method):
T(n) = T(ceiling(n/2)) + 1 ... Guess is Big-Oh(log base 2 of n)
T(n) = 3T(floor(n/3)) + n ... Guess is Big-Omega (n * log base 3 of n)
T(n) = 2T(floor(n/2) + 17) + n ... Guess is Big-Oh(n * log base 2 of n).
I am giving my Solution for Problem 1:
Our Guess is: T(n) = O (log_2(n)).
By Induction Hypothesis assume T(k) <= c * log_2(k) for all k < n,here c is a const & c > 0
T(n) = T(ceiling(n/2)) + 1
<=> T(n) <= c*log_2(ceiling(n/2)) + 1
<=> " <= c*{log_2(n/2) + 1} + 1
<=> " = c*log_2(n/2) + c + 1
<=> " = c*{log_2(n) - log_2(2)} + c + 1
<=> " = c*log_2(n) - c + c + 1
<=> " = c*log_2(n) + 1
<=> T(n) not_<= c*log_2(n) because c*log_2(n) + 1 not_<= c*log_2(n).
To solve this remedy used a trick a follows:
T(n) = T(ceiling(n/2)) + 1
<=> " <= c*log(ceiling(n/2)) + 1
<=> " <= c*{log_2 (n/2) + b} + 1 where 0 <= b < 1
<=> " <= c*{log_2 (n) - log_2(2) + b) + 1
<=> " = c*{log_2(n) - 1 + b} + 1
<=> " = c*log_2(n) - c + bc + 1
<=> " = c*log_2(n) - (c - bc - 1) if c - bc -1 >= 0
c >= 1 / (1 - b)
<=> T(n) <= c*log_2(n) for c >= {1 / (1 - b)}
so T(n) = O(log_2(n)).
This solution is seems to be correct to me ... My Ques is: Is it the proper approach to do?
Thanks to all of U.
For the first exercise:
We want to show by induction that T(n) <= ceiling(log(n)) + 1.
Let's assume that T(1) = 1, than T(1) = 1 <= ceiling(log(1)) + 1 = 1 and the base of the induction is proved.
Now, we assume that for every 1 <= i < nhold that T(i) <= ceiling(log(i)) + 1.
For the inductive step we have to distinguish the cases when n is even and when is odd.
If n is even: T(n) = T(ceiling(n/2)) + 1 = T(n/2) + 1 <= ceiling(log(n/2)) + 1 + 1 = ceiling(log(n) - 1) + 1 + 1 = ceiling(log(n)) + 1.
If n is odd: T(n) = T(ceiling(n/2)) + 1 = T((n+1)/2) + 1 <= ceiling(log((n+1)/2)) + 1 + 1 = ceiling(log(n+1) - 1) + 1 + 1 = ceiling(log(n+1)) + 1 = ceiling(log(n)) + 1
The last passage is tricky, but is possibile because n is odd and then it cannot be a power of 2.
Problem #1:
T(1) = t0
T(2) = T(1) + 1 = t0 + 1
T(4) = T(2) + 1 = t0 + 2
T(8) = T(4) + 1 = t0 + 3
...
T(2^(m+1)) = T(2^m) + 1 = t0 + (m + 1)
Letting n = 2^(m+1), we get that T(n) = t0 + log_2(n) = O(log_2(n))
Problem #2:
T(1) = t0
T(3) = 3T(1) + 3 = 3t0 + 3
T(9) = 3T(3) + 9 = 3(3t0 + 3) + 9 = 9t0 + 18
T(27) = 3T(9) + 27 = 3(9t0 + 18) + 27 = 27t0 + 81
...
T(3^(m+1)) = 3T(3^m) + 3^(m+1) = ((3^(m+1))t0 + (3^(m+1))(m+1)
Letting n = 3^(m+1), we get that T(n) = nt0 + nlog_3(n) = O(nlog_3(n)).
Problem #3:
Consider n = 34. T(34) = 2T(17+17) + 34 = 2T(34) + 34. We can solve this to find that T(34) = -34. We can also see that for odd n, T(n) = 1 + T(n - 1). We continue to find what values are fixed:
T(0) = 2T(17) + 0 = 2T(17)
T(17) = 1 + T(16)
T(16) = 2T(25) + 16
T(25) = T(24) + 1
T(24) = 2T(29) + 24
T(29) = T(28) + 1
T(28) = 2T(31) + 28
T(31) = T(30) + 1
T(30) = 2T(32) + 30
T(32) = 2T(33) + 32
T(33) = T(32) + 1
We get T(32) = 2T(33) + 32 = 2T(32) + 34, meaning that T(32) = -34. Working backword, we get
T(32) = -34
T(33) = -33
T(30) = -38
T(31) = -37
T(28) = -46
T(29) = -45
T(24) = -96
T(25) = -95
T(16) = -174
T(17) = -173
T(0) = -346
As you can see, this recurrence is a little more complicated than the others, and as such, you should probably take a hard look at this one. If I get any other ideas, I'll come back; otherwise, you're on your own.
EDIT:
After looking at #3 some more, it looks like you're right in your assessment that it's O(nlog_2(n)). So you can try listing a bunch of numbers - I did it from n=0 to n=45. You notice a pattern: it goes from negative numbers to positive numbers around n=43,44. To get the next even-index element of the sequence, you add powers of two, in the following order: 4, 8, 4, 16, 4, 8, 4, 32, 4, 8, 4, 16, 4, 8, 4, 64, 4, 8, 4, 16, 4, 8, 4, 32, ...
These numbers are essentially where you'd mark an arbitary-length ruler... quarters, halves, eights, sixteenths, etc. As such, we can solve the equivalent problem of finding the order of the sum 1 + 2 + 1 + 4 + 1 + 2 + 1 + 8 + ... (same as ours, divided by 4, and ours is shifted, but the order will still work). By observing that the sum of the first k numbers (where k is a power of 2) is equal to sum((n/(2^(k+1))2^k) = (1/2)sum(n) for k = 0 to log_2(n), we get that the simple recurrence is given by (n/2)log_2(n). Multiply by 4 to get ours, and shift x to the right by 34 and perhaps add a constant value to the result. So we're playing around with y = 2nlog_n(x) + k' for some constant k'.
Phew. That was a tricky one. Note that this recurrence does not admit of any arbitary "initial condiditons"; in other words, the recurrence does not describe a family of sequences, but one specific one, with no parameterization.

Resources