Find final square in matrix walking like a spiral - algorithm

Given the matrix A x A and a number of movements N.
And walking like a spiral:
right while possible, then
down while possible, then
left while possible, then
up while possible, repeat until got N.
Image with example (A = 8; N = 36)
In this example case, the final square is (4; 7).
My question is: Is it possible to use a generic formula to solve this?

Yes, it is possible to calculate the answer.
To do so, it will help to split up the problem into three parts.
(Note: I start counting at zero to simplify the math. This means that you'll have to add 1 to some parts of the answer. For instance, my answer to A = 8, N = 36 would be the final square (3; 6), which has the label 35.)
(Another note: this answer is quite similar to Nyavro's answer, except that I avoid the recursion here.)
In the first part, you calculate the labels on the diagonal:
(0; 0) has label 0.
(1; 1) has label 4*(A-1). The cycle can be evenly split into four parts (with your labels: 1..7, 8..14, 15..21, 22..27).
(2; 2) has label 4*(A-1) + 4*(A-3). After taking one cycle around the A x A matrix, your next cycle will be around a (A - 2) x (A - 2) matrix.
And so on. There are plenty of ways to now figure out the general rule for (K; K) (when 0 < K < A/2). I'll just pick the one that's easiest to show:
4*(A-1) + 4*(A-3) + 4*(A-5) + ... + 4*(A-(2*K-1)) =
4*A*K - 4*(1 + 3 + 5 + ... + (2*K-1)) =
4*A*K - 4*(K + (0 + 2 + 4 + ... + (2*K-2))) =
4*A*K - 4*(K + 2*(0 + 1 + 2 + ... + (K-1))) =
4*A*K - 4*(K + 2*(K*(K-1)/2)) =
4*A*K - 4*(K + K*(K-1)) =
4*A*K - 4*(K + K*K - K) =
4*A*K - 4*K*K =
4*(A-K)*K
(Note: check that 4*(A-K)*K = 28 when A = 8 and K = 1. Compare this to the label at (2; 2) in your example.)
Now that we know what labels are on the diagonal, we can figure out how many layers (say K) we have to remove from our A x A matrix so that the final square is on the edge. If we do this, then answering our question
What are the coordinates (X; Y) when I take N steps in a A x A matrix?
can be done by calculating this K and instead solve the question
What are the coordinates (X - K; Y - K) when I take N - 4*(A-K)*K steps in a (A - 2*K) x (A - 2*K) matrix?
To do this, we should find the largest integer K such that K < A/2 and 4*(A-K)*K <= N.
The solution to this is K = floor(A/2 - sqrt(A*A-N)/2).
All that remains is to find out the coordinates of a square that is N along the edge of some A x A matrix:
if 0*E <= N < 1*E, the coordinates are (0; N);
if 1*E <= N < 2*E, the coordinates are (N - E; E);
if 2*E <= N < 3*E, the coordinates are (E; 3*E - N); and
if 3*E <= N < 4*E, the coordinates are (4*E - N; 0).
Here, E = A - 1.
To conclude, here is a naive (layerNumber gives incorrect answers for large values of a due to float inaccuracy) Haskell implementation of this answer:
finalSquare :: Integer -> Integer -> Maybe (Integer, Integer)
finalSquare a n
| Just (x', y') <- edgeSquare a' n' = Just (x' + k, y' + k)
| otherwise = Nothing
where
k = layerNumber a n
a' = a - 2*k
n' = n - 4*(a-k)*k
edgeSquare :: Integer -> Integer -> Maybe (Integer, Integer)
edgeSquare a n
| n < 1*e = Just (0, n)
| n < 2*e = Just (n - e, e)
| n < 3*e = Just (e, 3*e - n)
| n < 4*e = Just (4*e - n, 0)
| otherwise = Nothing
where
e = a - 1
layerNumber :: Integer -> Integer -> Integer
layerNumber a n = floor $ aa/2 - sqrt(aa*aa-nn)/2
where
aa = fromInteger a
nn = fromInteger n

Here is the possible solution:
f a n | n < (a-1)*1 = (0, n)
| n < (a-1)*2 = (n-(a-1), a-1)
| n < (a-1)*3 = (a-1, 3*(a-1)-n)
| n < (a-1)*4 = (4*(a-1)-n, 0)
| otherwise = add (1,1) (f (a-2) (n - 4*(a-1))) where
add (x1, y1) (x2, y2) = (x1+x2, y1+y2)
This is a basic solution, it may be generalized further - I just don't know how much generalization you need. So you can get the idea.
Edit
Notes:
The solution is for 0-based index
Some check for existence is required (n >= a*a)

I'm going to propose a relatively simple workaround here which generates all the indices in O(A^2) time so that they can later be accessed in O(1) for any N. If A changes, however, we would have to execute the algorithm again, which would once more consume O(A^2) time.
I suggest you use a structure like this to store the indices to access your matrix:
Coordinate[] indices = new Coordinate[A*A]
Where Coordinate is just a pair of int.
You can then fill your indices array by using some loops:
(This implementation uses 1-based array access. Correct expressions containing i, sentinel and currentDirection accordingly if this is an issue.)
Coordinate[] directions = { {1, 0}, {0, 1}, {-1, 0}, {0, -1} };
Coordinate c = new Coordinate(1, 1);
int currentDirection = 1;
int i = 1;
int sentinel = A;
int sentinelIncrement = A - 1;
boolean sentinelToggle = false;
while(i <= A * A) {
indices[i] = c;
if (i >= sentinel) {
if (sentinelToggle) {
sentinelIncrement -= 1;
}
sentinel += sentinelIncrement;
sentinelToggle = !sentinelToggle;
currentDirection = currentDirection mod 4 + 1;
}
c += directions[currentDirection];
i++;
}
Alright, off to the explanation: I'm using a variable called sentinel to keep track of where I need to switch directions (directions are simply switched by cycling through the array directions).
The value of sentinel is incremented in such a way that it always has the index of a corner in our spiral. In your example the sentinel would take on the values 8, 15, 22, 28, 34, 39... and so on.
Note that the index of "sentinel" increases twice by 7 (8, 15 = 8 + 7, 22 = 15 + 7), then by 6 (28 = 22 + 6, 34 = 28 + 6), then by 5 and so on. In my while loop I used the boolean sentinelToggle for this. Each time we hit a corner of the spiral (this is exactly iff i == sentinel, which is where the if-condition comes in) we increment the sentinel by sentinelIncrement and change the direction we're heading. If sentinel has been incremented twice by the same value, the if-condition if (sentinelToggle) will be true, so sentinelIncrement is decreased by one. We have to decrease sentinelIncrement because our spiral gets smaller as we go on.
This goes on as long as i <= A*A, that is, as long as our array indices has still entries that are zero.
Note that this does not give you a closed formula for a spiral coordinate in respect to N (which would be O(1) ); instead it generates the indices for all N which takes up O(A^2) time and after that guarantees access in O(1) by simply calling indices[N].
O(n^2) hopefully shouldn't hurt too badly because I'm assuming that you'll also need to fill your matrix at some point which also takes O(n^2).
If efficiency is a problem, consider getting rid off sentinelToggle so it doesn't mess up branch prediction. Instead, decrement sentinelIncrement every time the while condition is met. To get the same effect for your sentinel value, simply start sentinelIncrement at (A - 1) * 2 and every time the if-condition is met, execute:
sentinel += sentinelIncrement / 2
The integer division will have the same effect as only decreasing sentinelIncrement every second time. I didn't do this whole thing in my version because I think it might be more easily understandable with just a boolean value.
Hope this helps!

Related

How many possible way for barcode to appear with limitation constrain

This is one of homeworks from a grader I've got. I've been struggling on this question for two days now. The topic is about Dynamic programming and I have no idea how to make sense of it.
The detail is the following.
A barcode consists of black and white vertical lines in different arrangement. For simplicity, we use a string of “0” and “1” to identify a barcode such that “0” represents a black line while “1” represents a white line.
A barcode is designed to be robust to error thus it has to follow some specific rules:
1) A barcode must consists of exactly N lines
2) There can be no more than M consecutive lines of same color. For example, when M=3, the barcode “01100001” is illegal because it consists of four consecutive white lines. However, 1001100 is legal.
3) We define “color changing” as follows. Color changing occurs when two
consecutive lines have different colors. For example, 1001100 has 3 color
changing. A barcode must have exactly K color changing.
4) The first line is always a black line.
We interest in knowing the number of possible barcode with respect to given
values of N, M and K.
Input
There are only one line contains 3 integers N, M and K where 1 <= N,M <= 30 and 0 <= K <= 30
Output
The output must contain exactly one line giving the number of possible barcodes.
For example
Input
4 3 1
Output
3
Input
5 2 2
Output
3
Input
7 9 4
Output
15
At each step ( the i barcode ) we have 2 options: either choose it white or black, then depend on that update your state (m and k).
here a pseudo Java code with comments, don't hesitate to ask if something is not clear:
static int n,m,k,memo[][][][];
static int dp(int i,int mm,int kk,int last) {
if(mm > m || kk > k) return 0; // limitation constrains
if(i==n) return kk==k?1:0; // if we build our barcode ( i == n ), we need to check color changing if it's ok return 1 else return 0
if(memo[i][mm][kk][last] != -1) return memo[i][mm][kk][last]; // momoization
int ans = 0;
ans += dp(i+1,last==1?mm+1:1,kk+(last!=1?1:0),1); // choose black as a color of this one and update state ( mm, kk )
ans += dp(i+1,last==0?mm+1:1,kk+(last!=0?1:0),0); // choose white as a color of this one and update state ( mm, kk )
return memo[i][mm][kk][last] = ans;
}
public static void main (String[] args) throws java.lang.Exception {
n = 4; m = 3; k = 1;
memo = new int[n+1][m+1][k+1][2];
for(int i=0;i<n;i++) for(int j=0;j<=m;j++) for(int l=0;l<=k;l++) Arrays.fill(memo[i][j][l], -1);
System.out.print(dp(1,1,0,1));
}
There is a quite simple recurrence relation, if T(N, M, K) is the output :
T(N, M, K) = T(N - 1, M, K - 1) + T(N - 2, M, K - 1) + ... + T(N - M, M, K - 1)
A valid barcode (N, M, K) is always a smaller valid barcode plus one new colour, the size of this new colour could be anything from 1 to M.
Thanks to this relation you can create for each M, a N x K table and solve the problem in O(NMK) with dynamic programming.
These rules should be enough to initialize the recurrence:
T(N, M, K) = 0 if (K >= N) and 1 if (K = N - 1)
T(N, M, K) = 0 if ((K+1) * M < N)

How to find the following type of set with computation time less than O(n)?

Here 5 different sets are shown. S1 contains 1. Next set S2 is calculated from S1 considering the following logic:
Suppose Sn contains {a1,a2,a3,a4.....,an} and middle element of Sn is b.
Then the set Sn+1 contains elements {b,b+a1,b+a2,......,b+an}. Total (n+1) elements. If a set contains even number of elements then middle element is (n/2 +1) .
Now, if n is given as input then we have to display all the elements of set Sn.
Clearly it is possible to solve the problem in O(n) time.
we can compute all the middle element as (2^(n-1) - middle element of the previous set + 1) where s1 ={1} is base case. In this way O(n) time we will get the all middle elements till (n-1)th set. So, middle element of (n-1)th set is the first element of the nth set set. (middle element of (n-1)th set + middle element of (n-2)th set) is the middle second element of the nth set. In this way we will get all the elements of nth set.
So it needs O(n) time.
Here id the complete java code I have written:
public class SpecialSubset {
private static Scanner inp;
public static void main(String[] args) {
// TODO Auto-generated method stub
int N,fst,mid,con=0;
inp = new Scanner(System.in);
N=inp.nextInt();
int[] setarr=new int[N];
int[] midarr=new int[N];
fst=1;
mid=1;
midarr[0]=1;
for(int i=1;i<N;i++)
{
midarr[i]=(int) (Math.pow(2, i)-midarr[i-1]+1);
}
setarr[0]=midarr[N-2];
System.out.print(setarr[0]);
System.out.print(" ");
for(int i=1,j=N-3;i<N-1;i++,j--)
{
setarr[i]=setarr[i-1]+midarr[j];
System.out.print(setarr[i]);
System.out.print(" ");
}
setarr[N-1]=setarr[N-2]+1;
System.out.print(setarr[N-1]);
}
}
Here is the link of the Question:
https://www.hackerrank.com/contests/projecteuler/challenges/euler103
IS it possible to solve the problem with less than O(n) time?
#Paul Boddington has given an answer that relies on the sequence of first numbers of these sets being the Narayana-Zidek-Capell numbers and has checked it for some small-ish values. However, there was no proof of the conjecture given. This answer is in addition to the above, to make it complete. I'm no HTML/CSS/Markdown guru, so you'll have to excuse the bad positioning of subscripts (If anyone can improve those - be my guest.
Notation:
Let aij be the i-th number in the j-th set.
I'll also define bj as the first number of the j-2-th set. This is the sequence the proof is about. The -2 is to account for the first and second 1 in the Narayana-Zidek-Capell sequence.
Generating rules:
The problem statement didn't clarify what "center number" is for a even-length set (a list really, but whatever), but it seems they meant the "center right" in that case. I'll denote the rules numbers in bold when I use them below.
a11 = 1
a1n = aceil(n+1⁄2)n-1
ain = a1n + ai-1n-1
bn = a1n-2
Proof:
First step is to make a slightly more involved formula for ain by unwinding the recursion a bit more and substituting b:
ain = Σ a1n-j = Σ bn-j+2 for j in [0 ... i-1]
Next, we consider two cases for bn - one where n is odd, one where n is even.
Even case:
b2n+2 = a12n =
2 = aceil(2n+1⁄2)2n-1 = an+12n-1 =
3 = a12n-1 + an2n-2 =
2, 4 = b2n+1 + a12n-1 =
5 = 2 * b2n+1
Odd case:
b2n+1 = a12n-1 =
2 = aceil(2n⁄2)2n-2 = an2n-2 =
3 = a12n-2 + an-12n-3 =
4 = 2 * b2n + (an-12n-3 - a12n-2) =
2 = 2 * b2n + (an-12n-3 - an2n-3) =
5 = 2 * b2n - bn
These rules are the exact sequence definition, and provide a way to generate the nth set in linear time (as opposed to quadratic when generating each set in turn)
The smallest numbers in the sets appear to be the Narayana-Zidek-Capell numbers
1, 1, 2, 3, 6, 11, 22, ...
The other numbers are obtained from the first number by repeatedly adding these numbers in reverse.
For example,
S6 = {11, 17, 20, 22, 23, 24}
+6 +3 +2 +1 +1
Using a recurrence for the Narayana-Zidek-Capell sequence found in that link, I have managed to produce a solution for this problem that runs in O(n) time. Here is a solution in Java. It only works for n <= 32 due to int overflow, but it could be written using BigInteger to work for higher values.
static Set<Integer> set(int n) {
int[] a = new int[n + 2];
for (int i = 1; i < n + 2; i++) {
if (i <= 2)
a[i] = 1;
else if (i % 2 == 0)
a[i] = 2 * a[i - 1];
else
a[i] = 2 * a[i - 1] - a[i / 2];
}
Set<Integer> set = new HashSet<>();
int sum = 0;
for (int i = n + 1; i >= 2; i--) {
sum += a[i];
set.add(sum);
}
return set;
}
I'm not able to justify right now why this is the same as the set in the question, but I'm working on it. However I have checked for all n <= 32 that this algorithm gives the same set as the "obvious" algorithm, so I'm reasonably sure it's correct.

No of ways to walk M steps in a grid

You are situated in an grid at position x,y. The dimensions of the row is dx,dy. In one step, you can walk one step ahead or behind in the row or the column. In how many ways can you take M steps such that you do not leave the grid at any point ?You can visit the same position more than once.
You leave the grid if you for any x,y either x,y <= 0 or x,y > dx,dy.
1 <= M <= 300
1 <= x,y <= dx,dy <= 100
Input:
M
x y
dx dy
Output:
no of ways
Example:
Input:
1
6 6
12 12
Output:
4
Example:
Input:
2
6 6
12 12
Output:
16
If you are at position 6,6 then you can walk to (6,5),(6,7),(5,6),(7,6).
I am stuck at how to use Pascal's Triangle to solve it.Is that the correct approach? I have already tried brute force but its too slow.
C[i][j], Pascal Triangle
C[i][j] = C[i - 1][j - 1] + C[i - 1][j]
T[startpos][stp]
T[pos][stp] = T[pos + 1][stp - 1] + T[pos - 1][stp - 1]
You can solve 1d problem with the formula you provided.
Let H[pos][step] be number of ways to move horizontal using given number of steps.
And V[pos][step] be number of ways to move vertical sing given number of steps.
You can iterate number of steps that will be made horizontal i = 0..M
Number of ways to move so is H[x][i]*V[y][M-i]*C[M][i], where C is binomial coefficient.
You can build H and V in O(max(dx,dy)*M) and do second step in O(M).
EDIT: Clarification on H and V. Supppose that you have line, that have d cells: 1,2,...,d. You're standing at cell number pos then T[pos][step] = T[pos-1][step-1] + T[pos+1][step-1], as you can move either forward or backward.
Base cases are T[0][step] = 0, T[d+1][step] = 0, T[pos][0] = 1.
We build H assuming d = dx and V assuming d = dy.
EDIT 2: Basically, the idea of algorithm is since we move in one of 2 dimensions and check is also based on each dimension independently, we can split 2d problem in 2 1d problems.
One way would be an O(n^3) dynamic programming solution:
Prepare a 3D array:
int Z[dx][dy][M]
Where Z[i][j][n] holds the number of paths that start from position (i,j) and last n moves.
The base case is Z[i][j][0] = 1 for all i, j
The recursive case is Z[i][j][n+1] = Z[i-1][j][n] + Z[i+1][j][n] + Z[i][j-1][n] + Z[i][j+1][n] (only include terms in the sumation that are on the map)
Once the array is filled out return Z[x][y][M]
To save space you can discard each 2D array for n after it is used.
Here's a Java solution I've built for the original hackerrank problem. For big grids runs forever. Probably some smart math is needed.
long compute(int N, int M, int[] positions, int[] dimensions) {
if (M == 0) {
return 1;
}
long sum = 0;
for (int i = 0; i < N; i++) {
if (positions[i] < dimensions[i]) {
positions[i]++;
sum += compute(N, M - 1, positions, dimensions);
positions[i]--;
}
if (positions[i] > 1) {
positions[i]--;
sum += compute(N, M - 1, positions, dimensions);
positions[i]++;
}
}
return sum % 1000000007;
}

The "guess the number" game for arbitrary rational numbers?

I once got the following as an interview question:
I'm thinking of a positive integer n. Come up with an algorithm that can guess it in O(lg n) queries. Each query is a number of your choosing, and I will answer either "lower," "higher," or "correct."
This problem can be solved by a modified binary search, in which you listing powers of two until you find one that exceeds n, then run a standard binary search over that range. What I think is so cool about this is that you can search an infinite space for a particular number faster than just brute-force.
The question I have, though, is a slight modification of this problem. Instead of picking a positive integer, suppose that I pick an arbitrary rational number between zero and one. My question is: what algorithm can you use to most efficiently determine which rational number I've picked?
Right now, the best solution I have can find p/q in at most O(q) time by implicitly walking the Stern-Brocot tree, a binary search tree over all the rationals. However, I was hoping to get a runtime closer to the runtime that we got for the integer case, maybe something like O(lg (p + q)) or O(lg pq). Does anyone know of a way to get this sort of runtime?
I initially considered using a standard binary search of the interval [0, 1], but this will only find rational numbers with a non-repeating binary representation, which misses almost all of the rationals. I also thought about using some other way of enumerating the rationals, but I can't seem to find a way to search this space given just greater/equal/less comparisons.
Okay, here's my answer using continued fractions alone.
First let's get some terminology here.
Let X = p/q be the unknown fraction.
Let Q(X,p/q) = sign(X - p/q) be the query function: if it is 0, we've guessed the number, and if it's +/- 1 that tells us the sign of our error.
The conventional notation for continued fractions is A = [a0; a1, a2, a3, ... ak]
= a0 + 1/(a1 + 1/(a2 + 1/(a3 + 1/( ... + 1/ak) ... )))
We'll follow the following algorithm for 0 < p/q < 1.
Initialize Y = 0 = [ 0 ], Z = 1 = [ 1 ], k = 0.
Outer loop: The preconditions are that:
Y and Z are continued fractions of k+1 terms which are identical except in the last element, where they differ by 1, so that Y = [y0; y1, y2, y3, ... yk] and Z = [y0; y1, y2, y3, ... yk + 1]
(-1)k(Y-X) < 0 < (-1)k(Z-X), or in simpler terms, for k even, Y < X < Z and for k odd, Z < X < Y.
Extend the degree of the continued fraction by 1 step without changing the values of the numbers. In general, if the last terms are yk and yk + 1, we change that to [... yk, yk+1=∞] and [... yk, zk+1=1]. Now increase k by 1.
Inner loops: This is essentially the same as #templatetypedef's interview question about the integers. We do a two-phase binary search to get closer:
Inner loop 1: yk = ∞, zk = a, and X is between Y and Z.
Double Z's last term: Compute M = Z but with mk = 2*a = 2*zk.
Query the unknown number: q = Q(X,M).
If q = 0, we have our answer and go to step 17 .
If q and Q(X,Y) have opposite signs, it means X is between Y and M, so set Z = M and go to step 5.
Otherwise set Y = M and go to the next step:
Inner loop 2. yk = b, zk = a, and X is between Y and Z.
If a and b differ by 1, swap Y and Z, go to step 2.
Perform a binary search: compute M where mk = floor((a+b)/2, and query q = Q(X,M).
If q = 0, we're done and go to step 17.
If q and Q(X,Y) have opposite signs, it means X is between Y and M, so set Z = M and go to step 11.
Otherwise, q and Q(X,Z) have opposite signs, it means X is between Z and M, so set Y = M and go to step 11.
Done: X = M.
A concrete example for X = 16/113 = 0.14159292
Y = 0 = [0], Z = 1 = [1], k = 0
k = 1:
Y = 0 = [0; ∞] < X, Z = 1 = [0; 1] > X, M = [0; 2] = 1/2 > X.
Y = 0 = [0; ∞], Z = 1/2 = [0; 2], M = [0; 4] = 1/4 > X.
Y = 0 = [0; ∞], Z = 1/4 = [0; 4], M = [0; 8] = 1/8 < X.
Y = 1/8 = [0; 8], Z = 1/4 = [0; 4], M = [0; 6] = 1/6 > X.
Y = 1/8 = [0; 8], Z = 1/6 = [0; 6], M = [0; 7] = 1/7 > X.
Y = 1/8 = [0; 8], Z = 1/7 = [0; 7]
--> the two last terms differ by one, so swap and repeat outer loop.
k = 2:
Y = 1/7 = [0; 7, ∞] > X, Z = 1/8 = [0; 7, 1] < X,
M = [0; 7, 2] = 2/15 < X
Y = 1/7 = [0; 7, ∞], Z = 2/15 = [0; 7, 2],
M = [0; 7, 4] = 4/29 < X
Y = 1/7 = [0; 7, ∞], Z = 4/29 = [0; 7, 4],
M = [0; 7, 8] = 8/57 < X
Y = 1/7 = [0; 7, ∞], Z = 8/57 = [0; 7, 8],
M = [0; 7, 16] = 16/113 = X
--> done!
At each step of computing M, the range of the interval reduces. It is probably fairly easy to prove (though I won't do this) that the interval reduces by a factor of at least 1/sqrt(5) at each step, which would show that this algorithm is O(log q) steps.
Note that this can be combined with templatetypedef's original interview question and apply towards any rational number p/q, not just between 0 and 1, by first computing Q(X,0), then for either positive/negative integers, bounding between two consecutive integers, and then using the above algorithm for the fractional part.
When I have a chance next, I will post a python program that implements this algorithm.
edit: also, note that you don't have to compute the continued fraction each step (which would be O(k), there are partial approximants to continued fractions that can compute the next step from the previous step in O(1).)
edit 2: Recursive definition of partial approximants:
If Ak = [a0; a1, a2, a3, ... ak] = pk/qk, then pk = akpk-1 + pk-2, and qk = akqk-1 + qk-2. (Source: Niven & Zuckerman, 4th ed, Theorems 7.3-7.5. See also Wikipedia)
Example: [0] = 0/1 = p0/q0, [0; 7] = 1/7 = p1/q1; so [0; 7, 16] = (16*1+0)/(16*7+1) = 16/113 = p2/q2.
This means that if two continued fractions Y and Z have the same terms except the last one, and the continued fraction excluding the last term is pk-1/qk-1, then we can write Y = (ykpk-1 + pk-2) / (ykqk-1 + qk-2) and Z = (zkpk-1 + pk-2) / (zkqk-1 + qk-2). It should be possible to show from this that |Y-Z| decreases by at least a factor of 1/sqrt(5) at each smaller interval produced by this algorithm, but the algebra seems to be beyond me at the moment. :-(
Here's my Python program:
import math
# Return a function that returns Q(p0/q0,p/q)
# = sign(p0/q0-p/q) = sign(p0q-q0p)*sign(q0*q)
# If p/q < p0/q0, then Q() = 1; if p/q < p0/q0, then Q() = -1; otherwise Q()=0.
def makeQ(p0,q0):
def Q(p,q):
return cmp(q0*p,p0*q)*cmp(q0*q,0)
return Q
def strsign(s):
return '<' if s<0 else '>' if s>0 else '=='
def cfnext(p1,q1,p2,q2,a):
return [a*p1+p2,a*q1+q2]
def ratguess(Q, doprint, kmax):
# p2/q2 = p[k-2]/q[k-2]
p2 = 1
q2 = 0
# p1/q1 = p[k-1]/q[k-1]
p1 = 0
q1 = 1
k = 0
cf = [0]
done = False
while not done and (not kmax or k < kmax):
if doprint:
print 'p/q='+str(cf)+'='+str(p1)+'/'+str(q1)
# extend continued fraction
k = k + 1
[py,qy] = [p1,q1]
[pz,qz] = cfnext(p1,q1,p2,q2,1)
ay = None
az = 1
sy = Q(py,qy)
sz = Q(pz,qz)
while not done:
if doprint:
out = str(py)+'/'+str(qy)+' '+strsign(sy)+' X '
out += strsign(-sz)+' '+str(pz)+'/'+str(qz)
out += ', interval='+str(abs(1.0*py/qy-1.0*pz/qz))
if ay:
if (ay - az == 1):
[p0,q0,a0] = [pz,qz,az]
break
am = (ay+az)/2
else:
am = az * 2
[pm,qm] = cfnext(p1,q1,p2,q2,am)
sm = Q(pm,qm)
if doprint:
out = str(ay)+':'+str(am)+':'+str(az) + ' ' + out + '; M='+str(pm)+'/'+str(qm)+' '+strsign(sm)+' X '
print out
if (sm == 0):
[p0,q0,a0] = [pm,qm,am]
done = True
break
elif (sm == sy):
[py,qy,ay,sy] = [pm,qm,am,sm]
else:
[pz,qz,az,sz] = [pm,qm,am,sm]
[p2,q2] = [p1,q1]
[p1,q1] = [p0,q0]
cf += [a0]
print 'p/q='+str(cf)+'='+str(p1)+'/'+str(q1)
return [p1,q1]
and a sample output for ratguess(makeQ(33102,113017), True, 20):
p/q=[0]=0/1
None:2:1 0/1 < X < 1/1, interval=1.0; M=1/2 > X
None:4:2 0/1 < X < 1/2, interval=0.5; M=1/4 < X
4:3:2 1/4 < X < 1/2, interval=0.25; M=1/3 > X
p/q=[0, 3]=1/3
None:2:1 1/3 > X > 1/4, interval=0.0833333333333; M=2/7 < X
None:4:2 1/3 > X > 2/7, interval=0.047619047619; M=4/13 > X
4:3:2 4/13 > X > 2/7, interval=0.021978021978; M=3/10 > X
p/q=[0, 3, 2]=2/7
None:2:1 2/7 < X < 3/10, interval=0.0142857142857; M=5/17 > X
None:4:2 2/7 < X < 5/17, interval=0.00840336134454; M=9/31 < X
4:3:2 9/31 < X < 5/17, interval=0.00379506641366; M=7/24 < X
p/q=[0, 3, 2, 2]=5/17
None:2:1 5/17 > X > 7/24, interval=0.00245098039216; M=12/41 < X
None:4:2 5/17 > X > 12/41, interval=0.00143472022956; M=22/75 > X
4:3:2 22/75 > X > 12/41, interval=0.000650406504065; M=17/58 > X
p/q=[0, 3, 2, 2, 2]=12/41
None:2:1 12/41 < X < 17/58, interval=0.000420521446594; M=29/99 > X
None:4:2 12/41 < X < 29/99, interval=0.000246366100025; M=53/181 < X
4:3:2 53/181 < X < 29/99, interval=0.000111613371282; M=41/140 < X
p/q=[0, 3, 2, 2, 2, 2]=29/99
None:2:1 29/99 > X > 41/140, interval=7.21500721501e-05; M=70/239 < X
None:4:2 29/99 > X > 70/239, interval=4.226364059e-05; M=128/437 > X
4:3:2 128/437 > X > 70/239, interval=1.91492009996e-05; M=99/338 > X
p/q=[0, 3, 2, 2, 2, 2, 2]=70/239
None:2:1 70/239 < X < 99/338, interval=1.23789953207e-05; M=169/577 > X
None:4:2 70/239 < X < 169/577, interval=7.2514738621e-06; M=309/1055 < X
4:3:2 309/1055 < X < 169/577, interval=3.28550190148e-06; M=239/816 < X
p/q=[0, 3, 2, 2, 2, 2, 2, 2]=169/577
None:2:1 169/577 > X > 239/816, interval=2.12389981991e-06; M=408/1393 < X
None:4:2 169/577 > X > 408/1393, interval=1.24415093544e-06; M=746/2547 < X
None:8:4 169/577 > X > 746/2547, interval=6.80448470014e-07; M=1422/4855 < X
None:16:8 169/577 > X > 1422/4855, interval=3.56972657711e-07; M=2774/9471 > X
16:12:8 2774/9471 > X > 1422/4855, interval=1.73982239227e-07; M=2098/7163 > X
12:10:8 2098/7163 > X > 1422/4855, interval=1.15020646951e-07; M=1760/6009 > X
10:9:8 1760/6009 > X > 1422/4855, interval=6.85549088053e-08; M=1591/5432 < X
p/q=[0, 3, 2, 2, 2, 2, 2, 2, 9]=1591/5432
None:2:1 1591/5432 < X < 1760/6009, interval=3.06364213998e-08; M=3351/11441 < X
p/q=[0, 3, 2, 2, 2, 2, 2, 2, 9, 1]=1760/6009
None:2:1 1760/6009 > X > 3351/11441, interval=1.45456726663e-08; M=5111/17450 < X
None:4:2 1760/6009 > X > 5111/17450, interval=9.53679318849e-09; M=8631/29468 < X
None:8:4 1760/6009 > X > 8631/29468, interval=5.6473816179e-09; M=15671/53504 < X
None:16:8 1760/6009 > X > 15671/53504, interval=3.11036635336e-09; M=29751/101576 > X
16:12:8 29751/101576 > X > 15671/53504, interval=1.47201634215e-09; M=22711/77540 > X
12:10:8 22711/77540 > X > 15671/53504, interval=9.64157420569e-10; M=19191/65522 > X
10:9:8 19191/65522 > X > 15671/53504, interval=5.70501257346e-10; M=17431/59513 > X
p/q=[0, 3, 2, 2, 2, 2, 2, 2, 9, 1, 8]=15671/53504
None:2:1 15671/53504 < X < 17431/59513, interval=3.14052228667e-10; M=33102/113017 == X
Since Python handles biginteger math from the start, and this program uses only integer math (except for the interval calculations), it should work for arbitrary rationals.
edit 3: Outline of proof that this is O(log q), not O(log^2 q):
First note that until the rational number is found, the # of steps nk for each new continued fraction term is exactly 2b(a_k)-1 where b(a_k) is the # of bits needed to represent a_k = ceil(log2(a_k)): it's b(a_k) steps to widen the "net" of the binary search, and b(a_k)-1 steps to narrow it). See the example above, you'll note that the # of steps is always 1, 3, 7, 15, etc.
Now we can use the recurrence relation qk = akqk-1 + qk-2 and induction to prove the desired result.
Let's state it in this way: that the value of q after the Nk = sum(nk) steps required for reaching the kth term has a minimum: q >= A*2cN for some fixed constants A,c. (so to invert, we'd get that the # of steps N is <= (1/c) * log2 (q/A) = O(log q).)
Base cases:
k=0: q = 1, N = 0, so q >= 2N
k=1: for N = 2b-1 steps, q = a1 >= 2b-1 = 2(N-1)/2 = 2N/2/sqrt(2).
This implies A = 1, c = 1/2 could provide desired bounds. In reality, q may not double each term (counterexample: [0; 1, 1, 1, 1, 1] has a growth factor of phi = (1+sqrt(5))/2) so let's use c = 1/4.
Induction:
for term k, qk = akqk-1 + qk-2. Again, for the nk = 2b-1 steps needed for this term, ak >= 2b-1 = 2(nk-1)/2.
So akqk-1 >= 2(Nk-1)/2 * qk-1 >= 2(nk-1)/2 * A*2Nk-1/4 = A*2Nk/4/sqrt(2)*2nk/4.
Argh -- the tough part here is that if ak = 1, q may not increase much for that one term, and we need to use qk-2 but that may be much smaller than qk-1.
Let's take the rational numbers, in reduced form, and write them out in order first of denominator, then numerator.
1/2, 1/3, 2/3, 1/4, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, ...
Our first guess is going to be 1/2. Then we'll go along the list until we have 3 in our range. Then we will take 2 guesses to search that list. Then we'll go along the list until we have 7 in our remaining range. Then we will take 3 guesses to search that list. And so on.
In n steps we'll cover the first 2O(n) possibilities, which is in the order of magnitude of efficiency that you were looking for.
Update: People didn't get the reasoning behind this. The reasoning is simple. We know how to walk a binary tree efficiently. There are O(n2) fractions with maximum denominator n. We could therefore search up to any particular denominator size in O(2*log(n)) = O(log(n)) steps. The problem is that we have an infinite number of possible rationals to search. So we can't just line them all up, order them, and start searching.
Therefore my idea was to line up a few, search, line up more, search, and so on. Each time we line up more we line up about double what we did last time. So we need one more guess than we did last time. Therefore our first pass uses 1 guess to traverse 1 possible rational. Our second uses 2 guesses to traverse 3 possible rationals. Our third uses 3 guesses to traverse 7 possible rationals. And our k'th uses k guesses to traverse 2k-1 possible rationals. For any particular rational m/n, eventually it will wind up putting that rational on a fairly big list that it knows how to do a binary search on efficiently.
If we did binary searches, then ignored everything we'd learned when we grab more rationals, then we'd put all of the rationals up to and including m/n in O(log(n)) passes. (That's because by that point we'll get to a pass with enough rationals to include every rational up to and including m/n.) But each pass takes more guesses, so that would be O(log(n)2) guesses.
However we actually do a lot better than that. With our first guess, we eliminate half the rationals on our list as being too big or small. Our next two guesses don't quite cut the space into quarters, but they don't come too far from it. Our next 3 guesses again don't quite cut the space into eighths, but they don't come too far from it. And so on. When you put it together, I'm convinced that the result is that you find m/n in O(log(n)) steps. Though I don't actually have a proof.
Try it out: Here is code to generate the guesses so that you can play and see how efficient it is.
#! /usr/bin/python
from fractions import Fraction
import heapq
import readline
import sys
def generate_next_guesses (low, high, limit):
upcoming = [(low.denominator + high.denominator,
low.numerator + high.numerator,
low.denominator, low.numerator,
high.denominator, high.numerator)]
guesses = []
while len(guesses) < limit:
(mid_d, mid_n, low_d, low_n, high_d, high_n) = upcoming[0]
guesses.append(Fraction(mid_n, mid_d))
heapq.heappushpop(upcoming, (low_d + mid_d, low_n + mid_n,
low_d, low_n, mid_d, mid_n))
heapq.heappush(upcoming, (mid_d + high_d, mid_n + high_n,
mid_d, mid_n, high_d, high_n))
guesses.sort()
return guesses
def ask (num):
while True:
print "Next guess: {0} ({1})".format(num, float(num))
if 1 < len(sys.argv):
wanted = Fraction(sys.argv[1])
if wanted < num:
print "too high"
return 1
elif num < wanted:
print "too low"
return -1
else:
print "correct"
return 0
answer = raw_input("Is this (h)igh, (l)ow, or (c)orrect? ")
if answer == "h":
return 1
elif answer == "l":
return -1
elif answer == "c":
return 0
else:
print "Not understood. Please say one of (l, c, h)"
guess_size_bound = 2
low = Fraction(0)
high = Fraction(1)
guesses = [Fraction(1,2)]
required_guesses = 0
answer = -1
while 0 != answer:
if 0 == len(guesses):
guess_size_bound *= 2
guesses = generate_next_guesses(low, high, guess_size_bound - 1)
#print (low, high, guesses)
guess = guesses[len(guesses)/2]
answer = ask(guess)
required_guesses += 1
if 0 == answer:
print "Thanks for playing!"
print "I needed %d guesses" % required_guesses
elif 1 == answer:
high = guess
guesses[len(guesses)/2:] = []
else:
low = guess
guesses[0:len(guesses)/2 + 1] = []
As an example to try it out I tried 101/1024 (0.0986328125) and found that it took 20 guesses to find the answer. I tried 0.98765 and it took 45 guesses. I tried 0.0123456789 and it needed 66 guesses and about a second to generate them. (Note, if you call the program with a rational number as an argument, it will fill in all of the guesses for you. This is a very helpful convenience.)
I've got it! What you need to do is to use a parallel search with bisection and continued fractions.
Bisection will give you a limit toward a specific real number, as represented as a power of two, and continued fractions will take the real number and find the nearest rational number.
How you run them in parallel is as follows.
At each step, you have l and u being the lower and upper bounds of bisection. The idea is, you have a choice between halving the range of bisection, and adding an additional term as a continued fraction representation. When both l and u have the same next term as a continued fraction, then you take the next step in the continued fraction search, and make a query using the continued fraction. Otherwise, you halve the range using bisection.
Since both methods increase the denominator by at least a constant factor (bisection goes by factors of 2, continued fractions go by at least a factor of phi = (1+sqrt(5))/2), this means your search should be O(log(q)). (There may be repeated continued fraction calculations, so it may end up as O(log(q)^2).)
Our continued fraction search needs to round to the nearest integer, not use floor (this is clearer below).
The above is kind of handwavy. Let's use a concrete example of r = 1/31:
l = 0, u = 1, query = 1/2. 0 is not expressible as a continued fraction, so we use binary search until l != 0.
l = 0, u = 1/2, query = 1/4.
l = 0, u = 1/4, query = 1/8.
l = 0, u = 1/8, query = 1/16.
l = 0, u = 1/16, query = 1/32.
l = 1/32, u = 1/16. Now 1/l = 32, 1/u = 16, these have different cfrac reps, so keep bisecting., query = 3/64.
l = 1/32, u = 3/64, query = 5/128 = 1/25.6
l = 1/32, u = 5/128, query = 9/256 = 1/28.4444....
l = 1/32, u = 9/256, query = 17/512 = 1/30.1176... (round to 1/30)
l = 1/32, u = 17/512, query = 33/1024 = 1/31.0303... (round to 1/31)
l = 33/1024, u = 17/512, query = 67/2048 = 1/30.5672... (round to 1/31)
l = 33/1024, u = 67/2048. At this point both l and u have the same continued fraction term 31, so now we use a continued fraction guess.
query = 1/31.
SUCCESS!
For another example let's use 16/113 (= 355/113 - 3 where 355/113 is pretty close to pi).
[to be continued, I have to go somewhere]
On further reflection, continued fractions are the way to go, never mind bisection except to determine the next term. More when I get back.
I think I found an O(log^2(p + q)) algorithm.
To avoid confusion in the next paragraph, a "query" refers to when the guesser gives the challenger a guess, and the challenger responds "bigger" or "smaller". This allows me to reserve the word "guess" for something else, a guess for p + q that is not asked directly to the challenger.
The idea is to first find p + q, using the algorithm you describe in your question: guess a value k, if k is too small, double it and try again. Then once you have an upper and lower bound, do a standard binary search. This takes O(log(p+q)T) queries, where T is an upper bound for the number of queries it takes to check a guess. Let's find T.
We want to check all fractions r/s with r + s <= k, and double k until k is sufficiently large. Note that there are O(k^2) fractions you need to check for a given value of k. Build a balanced binary search tree containing all these values, then search it to determine if p/q is in the tree. It takes O(log k^2) = O(log k) queries to confirm that p/q is not in the tree.
We will never guess a value of k greater than 2(p + q). Hence we can take T = O(log(p+q)).
When we guess the correct value for k (i.e., k = p + q), we will submit the query p/q to the challenger in the course of checking our guess for k, and win the game.
Total number of queries is then O(log^2(p + q)).
Okay, I think I figured out an O(lg2 q) algorithm for this problem that is based on Jason S's most excellent insight about using continued fractions. I thought I'd flesh the algorithm out all the way right here so that we have a complete solution, along with a runtime analysis.
The intuition behind the algorithm is that any rational number p/q within the range can be written as
a0 + 1 / (a1 + 1 / (a2 + 1 / (a3 + 1 / ...))
For appropriate choices of ai. This is called a continued fraction. More importantly, though these ai can be derived by running the Euclidean algorithm on the numerator and denominator. For example, suppose we want to represent 11/14 this way. We begin by noting that 14 goes into eleven zero times, so a crude approximation of 11/14 would be
0 = 0
Now, suppose that we take the reciprocal of this fraction to get 14/11 = 1 3/11. So if we write
0 + (1 / 1) = 1
We get a slightly better approximation to 11/14. Now that we're left with 3 / 11, we can take the reciprocal again to get 11/3 = 3 2/3, so we can consider
0 + (1 / (1 + 1/3)) = 3/4
Which is another good approximation to 11/14. Now, we have 2/3, so consider the reciprocal, which is 3/2 = 1 1/2. If we then write
0 + (1 / (1 + 1/(3 + 1/1))) = 5/6
We get another good approximation to 11/14. Finally, we're left with 1/2, whose reciprocal is 2/1. If we finally write out
0 + (1 / (1 + 1/(3 + 1/(1 + 1/2)))) = (1 / (1 + 1/(3 + 1/(3/2)))) = (1 / (1 + 1/(3 + 2/3)))) = (1 / (1 + 1/(11/3)))) = (1 / (1 + 3/11)) = 1 / (14/11) = 11/14
which is exactly the fraction we wanted. Moreover, look at the sequence of coefficients we ended up using. If you run the extended Euclidean algorithm on 11 and 14, you get that
11 = 0 x 14 + 11 --> a0 = 0
14 = 1 x 11 + 3 --> a1 = 1
11 = 3 x 3 + 2 --> a2 = 3
3 = 2 x 1 + 1 --> a3 = 2
It turns out that (using more math than I currently know how to do!) that this isn't a coincidence and that the coefficients in the continued fraction of p/q are always formed by using the extended Euclidean algorithm. This is great, because it tells us two things:
There can be at most O(lg (p + q)) coefficients, because the Euclidean algorithm always terminates in this many steps, and
Each coefficient is at most max{p, q}.
Given these two facts, we can come up with an algorithm to recover any rational number p/q, not just those between 0 and 1, by applying the general algorithm for guessing arbitrary integers n one at a time to recover all of the coefficients in the continued fraction for p/q. For now, though, we'll just worry about numbers in the range (0, 1], since the logic for handling arbitrary rational numbers can be done easily given this as a subroutine.
As a first step, let's suppose that we want to find the best value of a1 so that 1 / a1 is as close as possible to p/q and a1 is an integer. To do this, we can just run our algorithm for guessing arbitrary integers, taking the reciprocal each time. After doing this, one of two things will have happened. First, we might by sheer coincidence discover that p/q = 1/k for some integer k, in which case we're done. If not, we'll find that p/q is sandwiched between 1/(a1 - 1) and 1/a0 for some a1. When we do this, then we start working on the continued fraction one level deeper by finding the a2 such that p/q is between 1/(a1 + 1/a2) and 1/(a1 + 1/(a2 + 1)). If we magically find p/q, that's great! Otherwise, we then go one level down further in the continued fraction. Eventually, we'll find the number this way, and it can't take too long. Each binary search to find a coefficient takes at most O(lg(p + q)) time, and there are at most O(lg(p + q)) levels to the search, so we need only O(lg2(p + q)) arithmetic operations and probes to recover p/q.
One detail I want to point out is that we need to keep track of whether we're on an odd level or an even level when doing the search because when we sandwich p/q between two continued fractions, we need to know whether the coefficient we were looking for was the upper or the lower fraction. I'll state without proof that for ai with i odd you want to use the upper of the two numbers, and with ai even you use the lower of the two numbers.
I am almost 100% confident that this algorithm works. I'm going to try to write up a more formal proof of this in which I fill in all of the gaps in this reasoning, and when I do I'll post a link here.
Thanks to everyone for contributing the insights necessary to get this solution working, especially Jason S for suggesting a binary search over continued fractions.
Remember that any rational number in (0, 1) can be represented as a finite sum of distinct (positive or negative) unit fractions. For example, 2/3 = 1/2 + 1/6 and 2/5 = 1/2 - 1/10. You can use this to perform a straight-forward binary search.
Here is yet another way to do it. If there is sufficient interest, I will try to fill out the details tonight, but I can't right now because I have family responsibilities. Here is a stub of an implementation that should explain the algorithm:
low = 0
high = 1
bound = 2
answer = -1
while 0 != answer:
mid = best_continued_fraction((low + high)/2, bound)
while mid == low or mid == high:
bound += bound
mid = best_continued_fraction((low + high)/2, bound)
answer = ask(mid)
if -1 == answer:
low = mid
elif 1 == answer:
high = mid
else:
print_success_message(mid)
And here is the explanation. What best_continued_fraction(x, bound) should do is find the last continued fraction approximation to x with the denominator at most bound. This algorithm will take polylog steps to complete and finds very good (though not always the best) approximations. So for each bound we'll get something close to a binary search through all possible fractions of that size. Occasionally we won't find a particular fraction until we increase the bound farther than we should, but we won't be far off.
So there you have it. A logarithmic number of questions found with polylog work.
Update: And full working code.
#! /usr/bin/python
from fractions import Fraction
import readline
import sys
operations = [0]
def calculate_continued_fraction(terms):
i = len(terms) - 1
result = Fraction(terms[i])
while 0 < i:
i -= 1
operations[0] += 1
result = terms[i] + 1/result
return result
def best_continued_fraction (x, bound):
error = x - int(x)
terms = [int(x)]
last_estimate = estimate = Fraction(0)
while 0 != error and estimate.numerator < bound:
operations[0] += 1
error = 1/error
term = int(error)
terms.append(term)
error -= term
last_estimate = estimate
estimate = calculate_continued_fraction(terms)
if estimate.numerator < bound:
return estimate
else:
return last_estimate
def ask (num):
while True:
print "Next guess: {0} ({1})".format(num, float(num))
if 1 < len(sys.argv):
wanted = Fraction(sys.argv[1])
if wanted < num:
print "too high"
return 1
elif num < wanted:
print "too low"
return -1
else:
print "correct"
return 0
answer = raw_input("Is this (h)igh, (l)ow, or (c)orrect? ")
if answer == "h":
return 1
elif answer == "l":
return -1
elif answer == "c":
return 0
else:
print "Not understood. Please say one of (l, c, h)"
ow = Fraction(0)
high = Fraction(1)
bound = 2
answer = -1
guesses = 0
while 0 != answer:
mid = best_continued_fraction((low + high)/2, bound)
guesses += 1
while mid == low or mid == high:
bound += bound
mid = best_continued_fraction((low + high)/2, bound)
answer = ask(mid)
if -1 == answer:
low = mid
elif 1 == answer:
high = mid
else:
print "Thanks for playing!"
print "I needed %d guesses and %d operations" % (guesses, operations[0])
It appears slightly more efficient in guesses than the previous solution, and does a lot fewer operations. For 101/1024 it required 19 guesses and 251 operations. For .98765 it needed 27 guesses and 623 operations. For 0.0123456789 it required 66 guesses and 889 operations. And for giggles and grins, for 0.0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 (that's 10 copies of the previous one) it required 665 guesses and 23289 operations.
You can sort rational numbers in a given interval by for example the pair (denominator, numerator). Then to play the game you can
Find the interval [0, N] using the doubling-step approach
Given an interval [a, b] shoot for the rational with smallest denominator in the interval that is the closest to the center of the interval
this is however probably still O(log(num/den) + den) (not sure and it's too early in the morning here to make me think clearly ;-) )

Algorithm to find two repeated numbers in an array, without sorting

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.
E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.
What is the best way to find the repeated numbers?
P.S. [You should not use sorting]
There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.
bool flags[100];
for(int i = 0; i < 100; i++)
flags[i] = false;
for(int i = 0; i < input_size; i++)
if(flags[input_array[i]])
return input_array[i];
else
flags[input_array[i]] = true;
Of course there is the additional memory but this is the fastest.
OK, seems I just can't give it a rest :)
Simplest solution
int A[N] = {...};
int signed_1(n) { return n%2<1 ? +n : -n; } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n; } // 0,+1,-2,-3,+4,+5,-6,-7,...
long S1 = 0; // or int64, or long long, or some user-defined class
long S2 = 0; // so that it has enough bits to contain sum without overflow
for (int i=0; i<N-2; ++i)
{
S1 += signed_1(A[i]) - signed_1(i);
S2 += signed_2(A[i]) - signed_2(i);
}
for (int i=N-2; i<N; ++i)
{
S1 += signed_1(A[i]);
S2 += signed_2(A[i]);
}
S1 = abs(S1);
S2 = abs(S2);
assert(S1 != S2); // this algorithm fails in this case
p = (S1+S2)/2;
q = abs(S1-S2)/2;
One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().
if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).
Previous solution
I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:
Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)
so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2
Simple math remains:
(1) p+q = S1
(2) p^2+q^2 = S2
Surely you won't solve it as math classes teach to solve square equations.
First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.
You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.
You can calculate the sum and the product over the array, resulting in:
1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2
So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get
sum(Array) - (n-3)(n-2)/2 = x = p + q
Now do the same for the product:
1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q
prod(Array) / (n - 3)! = y = p * q
Your now got these terms:
x = p + q
y = p * q
=> y(p + q) = x(p * q)
If you transform this term, you should be able to calculate p and q
Insert each element into a set/hashtable, first checking if its are already in it.
You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.
Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.
Check this old but good paper on the topic:
Finding Repeated Elements (PDF)
Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.
For example, here's a relevant part from my answer:
bool has_duplicates(int* a, int m, int n)
{
/** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
Whether a[] array has duplicates.
precondition: all values are in [n, n+m) range.
feature: It marks visited items using a sign bit.
*/
assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
for (int *p = a; p != &a[m]; ++p) {
*p -= (n - 1); // [n, n+m) -> [1, m+1)
assert(*p > 0);
}
// determine: are there duplicates
bool has_dups = false;
for (int i = 0; i < m; ++i) {
const int j = abs(a[i]) - 1;
assert(j >= 0);
assert(j < m);
if (a[j] > 0)
a[j] *= -1; // mark
else { // already seen
has_dups = true;
break;
}
}
// restore the array
for (int *p = a; p != &a[m]; ++p) {
if (*p < 0)
*p *= -1; // unmark
// [1, m+1) -> [n, n+m)
*p += (n - 1);
}
return has_dups;
}
The program leaves the array unchanged (the array should be writeable but its values are restored on exit).
It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).
suppose array is
a[0], a[1], a[2] ..... a[n-1]
sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]
sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2
similarly
sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6
Suppose repeated elements are = X and Y
so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;
So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
I know the question is very old but I suddenly hit it and I think I have an interesting answer to it.
We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.
As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.
We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:
Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )
We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.
Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).
We can calculate (A XOR B) as follows:
A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2 ... XOR 0
Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).
For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.
But what if A XOR B is zero?
Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.
Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.
Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:
int count [10];
for (int i = 0; i < arraylen; i++) {
count[array[i]]++;
}
Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.
Here's implementation in Python of #eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).
def two_repeated(iterable):
s1, s2 = 0, 0
for i, j in enumerate(iterable):
s1 += j - i # number_of_digits(s1) ~ 2 * number_of_digits(i)
s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i)
s1 += (i - 1) + i
s2 += (i - 1)**2 + i**2
p = (s1 - int((2*s2 - s1**2)**.5)) // 2
# `Decimal().sqrt()` could replace `int()**.5` for really large integers
# or any function to compute integer square root
return p, s1 - p
Example:
>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
A more verbose version of the above code follows with explanation:
def two_repeated_seq(arr):
"""Return the only two duplicates from `arr`.
>>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
"""
n = len(arr)
assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
assert len(set(arr)) == (n - 2) # number of unique items
s1 = (n-2) + (n-1) # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits
s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
for i, j in enumerate(arr):
s1 += j - i
s2 += j*j - i*i
"""
s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
= sum(arr) - sum(range(n-2))
= sum(range(n-2)) + p + q - sum(range(n-2))
= p + q
"""
assert s1 == (sum(arr) - sum(range(n-2)))
"""
s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
= sum(i*i for i in arr) - sum(i*i for i in range(n-2))
= p*p + q*q
"""
assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))
"""
s1 = p+q
-> s1**2 = (p+q)**2
-> s1**2 = p*p + 2*p*q + q*q
-> s1**2 - (p*p + q*q) = 2*p*q
s2 = p*p + q*q
-> p*q = (s1**2 - s2)/2
Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
that p and q are roots of x**2 - B*x + C = 0
-> p = (B + sqrtD) / 2
-> q = (B - sqrtD) / 2
where sqrtD = sqrt(B**2 - 4*C)
-> p = (s1 + sqrt(2*s2 - s1**2))/2
"""
sqrtD = (2*s2 - s1**2)**.5
assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
sqrtD = int(sqrtD)
assert (s1 - sqrtD) % 2 == 0 # even
p = (s1 - sqrtD) // 2
q = s1 - p
assert q == ((s1 + sqrtD) // 2)
assert sqrtD == (q - p)
return p, q
NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.
Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)
You can use simple nested for loop
int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };
for (int i = 0; i < numArray.Length; i++)
{
for (int j = i + 1; j < numArray.Length; j++)
{
if (numArray[i] == numArray[j])
{
//DO SOMETHING
}
}
*OR you can filter the array and use recursive function if you want to get the count of occurrences*
int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;
void GetDuplicates(int[] array)
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if (array[i] == array[j])
{
a += 1;
}
}
Console.WriteLine(" {0} occurred {1} time/s", array[i], a);
IEnumerable<int> num = from n in array where n != array[i] select n;
myNewArray = null;
a = 1;
myNewArray = num.ToArray() ;
break;
}
GetDuplicates(myNewArray);
answer to 18..
you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!
check this out ...
O(n) time and O(1) space complexity
for(i=0;i< n;i++)
xor=xor^arr[i]
for(i=1;i<=n-3;i++)
xor=xor^i;
So in the given example you will get the xor of 3 and 5
xor=xor & -xor //Isolate the last digit
for(i = 0; i < n; i++)
{
if(arr[i] & xor)
x = x ^ arr[i];
else
y = y ^ arr[i];
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
x = x ^ i;
else
y = y ^ i;
}
x and y are your answers
For each number: check if it exists in the rest of the array.
Without sorting you're going to have a keep track of numbers you've already visited.
in psuedocode this would basically be (done this way so I'm not just giving you the answer):
for each number in the list
if number not already in unique numbers list
add it to the unique numbers list
else
return that number as it is a duplicate
end if
end for each
How about this:
for (i=0; i<n-1; i++) {
for (j=i+1; j<n; j++) {
if (a[i] == a[j]) {
printf("%d appears more than once\n",a[i]);
break;
}
}
}
Sure it's not the fastest, but it's simple and easy to understand, and requires
no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)
In c:
int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
int num = 0, i;
for (i=0; i < 8; i++)
num = num ^ arr[i] ^i;
Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.
I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.
So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.
Lets take an input array as below
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.
Lets apply my programme on this
using namespace std;
main()
{
//int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
int i = 0;
vector<int> vec;
int var = arr[0];
for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
{
var = var ^ arr[i];
if(var != 0 )
{
//put in vector
var = arr[i-1];
vec.push_back(var);
i = i-1;
}
var = arr[i+1];
}
for(int i = 0 ; i < vec.size() ; i++)
printf("value not repeated = %d\n",vec[i]);
}
This gives the output:
value not repeated= 2
value not repeated= 10
value not repeated= 4
Its simple and very straight forward, just use XOR man.
for(i=1;i<=n;i++) {
if(!(arr[i] ^ arr[i+1]))
printf("Found Repeated number %5d",arr[i]);
}
Here is an algorithm that uses order statistics and runs in O(n).
You can solve this by repeatedly calling SELECT with the median as parameter.
You also rely on the fact that After a call to SELECT,
the elements that are less than or equal to the median are moved to the left of the median.
Call SELECT on A with the median as the parameter.
If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
You continue this way recursively.
For example:
When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
After the first call to SELECT
A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
A={3, 2, 0, 1, 3}
After the second call to SELECT
A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
A={3, 3}, found.
This algorithm runs in O(n+n/2+n/4+...)=O(n).
What about using the https://en.wikipedia.org/wiki/HyperLogLog?
Redis does http://redis.io/topics/data-types-intro#hyperloglogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.
Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.
def repeated(ar,n):
count=0
for i in range(n):
for j in range(i+1,n):
if ar[i] == ar[j]:
count+=1
if count == 1:
count=0
print("repeated:",ar[i])
arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)
Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no .
This can be extended to find out if there is any missing no in the array or not.
This solution is O(n) in time complexity

Resources