Algorithmic complexity of examining all possible lines on a game board - algorithm

What is the time complexity of examining all possible lines of length l on game board of n x m?
For instance, a tic-tac-toe board is 3x3 and lines are defined as length 3; there are 8 possible lines. We could also imagine a board that is 9x9 and a rule that you need a line of length 5 in order to win. How would you characterize the complexity of examining every possible line with different values of n, m and l?
Note this is not considering traversing the game tree into future game states, just examining the current state of the board to see if there is a winner or not in the present state.

Clearly, you need to check horizontal, vertical, and diagonal lines.
Let us assume that the board is laid out with n always being the bigger number if the two are not equal, and that it is laying on its side" (lego style, not skyscraper style). So it is n across and m tall.
The horizontal lines will be n * (m - l) in number.
The vertical lines will be m * (n - l) in number.
The diagonal lines will be (m - l) * (n - l), or m*n - l*m - l*n + l*l
If we assume n >= m > l then we can safely say that it is within O(n^2), as we would expect for a two dimensional board.
We know that l > n >= m will have no results. If n = m = l we have a constant number (2*n + 2). If n = l > m we are left with an even better case, as we don't have to check the diagonals or the verticals, and you only have to check m lines. If n > l > m, then we can again exclude the verticals, but must consider the diagonals. In any event, it will be less than doing the diagonals, verticals, and horizontals.
There is an optimization that can be made, however, based on the phant0m's comment. It requires that you know what the last move made was.
You can safely assume that if a move was made, it was made at a time that there was no winner on the board. Therefore, if there is a win condition on the board after the move, it must have been formed as a result of the most recent move. Therefore, the worst case scenario given this information is that the winning line is formed with the most recent move on the end. You would therefore need to check the 4 line segments that extend l - 1 in each direction (horizontal, vertical, forward diagonal, and backward diagonal). This is a total of 4 * (2*l - 1), putting it safely in O(l). Considering you only need to store one additional coordinate (most recent move) this is a most wise optimization to make.

Related

Given N people, some of which are enemies, find number of intervals with no enemies

A friend gave me this problem as a challenge, and I've tried to find a problem like this on LeetCode, but sadly could not.
Question
Given a line of people numbered from 1 to N, and a list of pairs of M enemies, find the total number of sublines with people that contain no two people that are enemies.
Example: N = 5, enemies = [[3,4], [3,5]]
Answer: 9
Explanation: These continuous subintervals are:
[1,1], [2,2], [3,3], [1,2], [2,3], [1,3], [4,4], [4,5], [5,5]
My approach
We define a non-conflicting interval as a contiguous interval from (and including) [a,b] where no two people are enemies in that interval.
Working backwards, if I know there is a non conflicting interval from [1,3] like in the example given above, I know the number of contiguous intervals between those two numbers is n(n+1)/2 where n is the length of the interval. In this case, the interval length is 3, and so there are 6 intervals between (and including) [1,3] that count.
Extending this logic, if I have a list of all non-conflicting intervals, then the answer is simply the sum of (n_i*(n_i+1))/2 for every interval length n_i.
Then all I need to do is find these intervals. This is where I'm stuck.
I can't really think of a similar programming problem. This seems similar, but the opposite of what the Merge Intervals problem on leetcode asks for. In that problem we're sorta given the good intervals and are asked to combine them. Here we're given the bad.
Any guidance?
EDIT: Best I could come up with:
Does this work?
So let's define max_enemy[i] as the largest enemy that is less that a particular person i, where i is the usual [1,N]. We can generate this value in O(M) time simply using a the following loop:
max_enemy = [-1] * (N+1) # -1 means it doesn't exist
for e1, e2 in enms:
e1, e2 = min(e1,e2), max(e1, e2)
max_enemy[e2] = max(max_enemy[e2], e1)
Then if we go through the person's array keeping a sliding window. The sliding window ends as soon as we find a person i who has: max_enemy[i] < i. This way we know that including this person will break our contiguous interval. So we now know our interval is [s, i-1] and we can do our math. We reset s=i and continue.
Here is a visualization of how this works visually. We draw a path between any two enemies:
N=5, enemies = [[3,4], [3,5]]
1 2 3 4 5
| | |
-----
| |
--------
EDIT2: I know this doesn't work for N=5, enemies=[[1,4][3,5]], currently working on a fix, still stuck
You can solve this in O(M log M) time and O(M) space.
Let ENDINGAT(i) be the number of enemy-free intervals ending at position/person i. This is also the size of the largest enemy-free interval ending at i.
The answer you seek is the sum of all ENDINGAT(i) for every person i.
Let NEAREST(i) be the nearest enemy of person i that precedes person i. Let it be -1 if i has no preceding enemies.
Now we can write a simple formula to calculate all the ENDINGAT(values):
ENDINGAT(1) = 1, since there is only one interval ending at 1. For larger values:
ENDINGAT(i) = MIN( ENDINGAT(i-1)+1, i-NEAREST(i) )
So, it is very easy to calculate all the ENDINGAT(i) in order, as long as we can have all the NEAREST(i) in order. To get that, all you need to do is sort the enemy pairs by the highest member. Then for each i you can walk over all the pairs ending at i to find the closest one.
That's it -- it turns out to be pretty easy. The time is dominated by the O(M log M) time required to sort the enemy pairs, unless N is much bigger than M. In that case, you can skip runs of ENDINGAT for people with no preceding enemies, calculating their effect on the sum mathematically.
There's a cool visual way to see this!
Instead of focusing the line, let's look at the matrix of pairs of players. If ii and j are enemies, then the effect of this enemiship is precisely to eliminate from consideration (1) this interval, and (2) any interval strictly larger than it. Because enemiship is symmetric, we may as well just look at the upper-right half of the matrix, and the diagonal; we'll use the characters
"X" to denote that a pair is enemies,
"*" to indicate that a pair has been obscured by a pair of enemies, and
"%" in the lower half to mark it as not part of the upper-half matrix.
For the two examples in your code, observe their corresponding matrices:
# intervals: 9 # intervals: 10
0 1 2 3 4 0 1 2 3 4
------------------------ ------------------------
* * | 0 * * | 0
% * * | 1 % X * | 1
% % X X | 2 % % X | 2
% % % | 3 % % % | 3
% % % % | 4 % % % % | 4
The naive solution, provided below, solves the problem in O(N^2 M) time and O(N^2) space.
def matrix(enemies):
m = [[' ' for j in range(N)] for i in range(N)]
for (i,j) in enemies:
m[i][j] = 'X' #Mark Enemiship
# Now mark larger intervals as dead.
for q in range(0,i+1):
for r in range(j,N):
if m[q][r] == ' ':
m[q][r] = '*'
num_int = 0
for i in range(N):
for j in range(N):
if(j < i):
m[i][j] = '%'
elif m[i][j] == ' ':
num_int+=1
print("# intervals: ", num_int)
return m
To convince yourself further, here are the matrices where
player 2 is enemies with himself, so that there is a barrier, and there are two smaller versions of the puzzle on the intervals [0,1] and [3,4] each of which admits 3 sub-intervals)
Every player is enemies with the person two to their left, so that only length-(1 or 0) intervals are allowed (of which there are 4+5=9 intervals)
# intervals: 6 # intervals: 9
0 1 2 3 4 0 1 2 3 4
---------[===========+ --------[============+
* * * || 0 X * * || 0
% * * * || 1 % X * || 1
% % X * * II 2 % % X II 2
% % % | 3 % % % | 3
% % % % | 4 % % % % | 4
Complexity: Mathematically the same as sorting a list, or validating that it is sorted. that is, O(M log M) in the worst case, and O(M) space to sort, and still at least O(M) time in the best case to recognize if the list is sorted.
Bonus: This is also an excellent example to illustrate power of looking at the identity a problem, rather than its solution. Such a view of the of the problem will also inform smarter solutions. We can clearly do much better than the code I gave above...
We would clearly be done, for instance, if we could count the number of un-shaded points, which is the area of the smallest convex polygon covering the enemiships, together with the two boundary points. (Finding the two additional points can be done in O(M) time.) Now, this is probably not a problem you can solve in your sleep, but fortunately the problem of finding a convex hull, is so natural that the algorithms used to do it are well known.
In particular, a Graham Scan can do it in O(M) time, so long as we happen to be given the pairs of enemies so that one of their coordinates is sorted. Better still, once we have the set of points in the convex hull, the area can be calculated by dividing it into at most M axis-aligned rectangles. Therefore, if enemy pairs are sorted, the entire problem could be solved in O(M) time. Keep in mind that M could be dramatically than N, and we don't even need to store N numbers in an array! This corresponds to the arithmetic proposed to skip lines in the other answer to this question.
If they are not sorted, other Convex Hull algorithms yield an O(M log M) running time, with O(M) space, as given by #Matt Timmermans's solution. In fact, this is the general lower bound! This can be shown with a more complicated geometric reduction: if you can solve the problem, then you can compute the sum of the heights of each number, multiplied by its distance to "the new zero", of agents satisfying j+i = N. This sum can be used to compute distances to the diagonal line, which is enough to sort a list of numbers in O(M) time---a problem which cannot be solved in under O(M log M) time for adversarial inputs.
Ah, so why is it the case that we can get an O(N + M) solution by manually performing this integration, as done explicitly in the other solution? It is because we can sort the M numbers if we know that they fall into N bins, by Bucket Sort.
Thanks for sharing the puzzle!

How do I find the maximum probability using dynamic programming?

For better understanding of this question, you can check out:-
1) https://math.stackexchange.com/questions/3100336/how-to-calculate-the-probability-in-this-case
John is playing a game against a magician.In this game, there are initially 'N' identical boxes in front of him and one of them contains a magic pill ― after eating this pill, he becomes immortal.
He has to determine which box contains the pill. He is allowed to perform at most 'M' moves. In each move, he may do one of the following:
1)
Choose one of the boxes that are in front of him uniformly randomly and guess that this box contains the pill. If the guess is correct, the game ends and he gets the pill. Otherwise, after this guess, the magician adds K empty boxes in front of him in such a way that John cannot determine which boxes were added; the box he guessed also remains in front of him and he cannot distinguish this box from the other boxes in subsequent moves either.
2) Choose a number X such that X is a positive multiple of K, but strictly less than the current number of boxes in front of John. The magician then removes X empty boxes. Of course, John must not perform this move if the current number of boxes is ≤K.
If John plays optimally, what will be the maximum probability of him getting the pill ? 'N' is always less than 'K'.
Example:- Let M=3, so 3 moves are allowed. K=20,N=3.
In his first move, John selects a box with probability, x = 1/3 ,(20 boxes have been added(20+3==23) then in the second move, he again selects a box again, with probability this time, y=1/23*(2/3). Here, 2/3 denotes the probability of failure in the first move.
In the third move, he does the same thing with probability , z = 1/43*(22/23)*(2/3) .
So the total probability is= x+y+z=l1
Lets say, in the above case, in the second move,he chooses to remove 20 boxes and do nothing else, then the new final probability is= 1/3+0(nothing is done in second move!) + 2/3*(1/3)=l2. Now, as l2 > l1 ,so 'l2' is the answer to our question.
Basically, we have to determine which sequence of moves will yield the maximum probability? Also,
P(Winning) =P(Game ending in 1st Move)+P(Game ending in 2nd Move)+P(Game ending in 3rd Move) =(1/3)+0+(2/3)*(1/3) =5/9
Given, N,K,M how can we find out the maximum probability?
Do we have to apply dynamic programming?
Let p(N, K, M) be John's probability if he plays optimally. We have the following recurrence relations:
p(N, K, 0) = 0
If there are no remaining moves, then he loses.
if M > 0 and N < X, then p(N, K, M) = 1/N + (N−1)/N · p(N+K, K, M−1)
If there's at least one remaining move, and option #2 is not allowed, then his probability of winning is the probability that he guesses correctly in this round, plus the probability that he guesses wrongly in this round but he wins in a later turn.
if M > 0 and N ≥ X, then p(N, K, M) is the greater of these two:
1/N + (N−1)/N · p(N+K, K, M−1)
If he takes option #1, then this is the same as the case where he was forced to take option #1.
p(N % K, K, M−1), where '%' is the "remainder" or "modulus" operator
If he takes option #2, then he certainly won't win in this step, so his probability of winning is equal to the probability that he wins in a later turn.
Note that we only need to consider N % K, because he should certainly choose the largest value of X that he's allowed to. There's never any benefit to letting the pool of boxes remain unnecessarily large.
Dynamic programming, or recursion plus memoization, is well-suited to this; you can apply the above recurrence relations directly.
Note that K never changes, so you don't need an array dimension for it; and N only changes by adding or subtracting integer multiples of K, so you're best off using array indices n such that N = (N0 % K) + nK.
Additionally, note that M decreases by exactly 1 in each turn, so if you're using a dynamic-programming approach and you only want the final probability, then you don't need to retain probabilities for all values of M; rather, when building the array for a given value of M, you only need to keep the array for M−1.

Finding even numbers in an array without using feedback

I saw this post: Finding even numbers in an array and I was thinking about how you could do it without feedback. Here's what I mean.
Given an array of length n containing at most e even numbers and a
function isEven that returns true if the input is even and false
otherwise, write a function that prints all the even numbers in the
array using the fewest number of calls to isEven.
The answer on the post was to use a binary search, which is neat since it doesn't mean the array has to be in order. The number of times you have to check if a number is even is e log n instead if n because you do a binary search (log n) to find one even number each time (e times).
But that idea means that you divide the array in half, test for evenness, then decide which half to keep based on the result.
My question is whether or not you can beat n calls on a fixed testing scheme where you check all the numbers you want for evenness without knowing the outcome, and then figure out where the even numbers are after you've done all the tests based on the results. So I guess it's no-feedback or blind or some term like that.
I was thinking about this for a while and couldn't come up with anything. The binary search idea doesn't work at all with this constraint, but maybe something else does? Even getting down to n/2 calls instead of n (yes, I know they are the same big-O) would be good.
The technical term for "no-feedback or blind" is "non-adaptive". O(e log n) calls still suffice, but the algorithm is rather more involved.
Instead of testing the evenness of products, we're going to test the evenness of sums. Let E ≠ F be distinct subsets of {1, …, n}. If we have one array x1, …, xn with even numbers at positions E and another array y1, …, yn with even numbers at positions F, how many subsets J of {1, …, n} satisfy
(∑i in J xi) mod 2 ≠ (∑i in J yi) mod 2?
The answer is 2n-1. Let i be an index such that xi mod 2 ≠ yi mod 2. Let S be a subset of {1, …, i - 1, i + 1, … n}. Either J = S is a solution or J = S union {i} is a solution, but not both.
For every possible outcome E, we need to make calls that eliminate every other possible outcome F. Suppose we make 2e log n calls at random. For each pair E ≠ F, the probability that we still cannot distinguish E from F is (2n-1/2n)2e log n = n-2e, because there are 2n possible calls and only 2n-1 fail to distinguish. There are at most ne + 1 choices of E and thus at most (ne + 1)ne/2 pairs. By a union bound, the probability that there exists some indistinguishable pair is at most n-2e(ne + 1)ne/2 < 1 (assuming we're looking at an interesting case where e ≥ 1 and n ≥ 2), so there exists a sequence of 2e log n calls that does the job.
Note that, while I've used randomness to show that a good sequence of calls exists, the resulting algorithm is deterministic (and, of course, non-adaptive, because we chose that sequence without knowledge of the outcomes).
You can use the Chinese Remainder Theorem to do this. I'm going to change your notation a bit.
Suppose you have N numbers of which at most E are even. Choose a sequence of distinct prime powers q1,q2,...,qk such that their product is at least N^E, i.e.
qi = pi^ei
where pi is prime and ei > 0 is an integer and
q1 * q2 * ... * qk >= N^E
Now make a bunch of 0-1 matrices. Let Mi be the qi x N matrix where the entry in row r and column c has a 1 if c = r mod qi and a 0 otherwise. For example, if qi = 3^2, then row 2 has ones in columns 2, 11, 20, ... 2 + 9j and 0 elsewhere.
Now stack these matrices vertically to get a Q x N matrix M, where Q = q1 + q2 + ... + qk. The rows of M tell you which numbers to multiply together (the nonzero positions). This gives a total of Q products that you need to test for evenness. Call each row a "trial", and say that a "trial involves j" if the jth column of that row is nonempty. The theorem you need is the following:
THEOREM: The number in position j is even if and only if all trials involving j are even.
So you do a total of Q trials and then look at the results. If you choose the prime powers intelligently, then Q should be significantly smaller than N. There are asymptotic results that show you can always get Q on the order of
(2E log N)^2 / 2log(2E log N)
This theorem is actually a corollary of the Chinese Remainder Theorem. The only place that I've seen this used is in Combinatorial Group Testing. Apparently the problem originally arose when testing soldiers coming back from WWII for syphilis.
The problem you are facing is a form of group testing, type of a problem with the objective of reducing the cost of identifying certain elements of a set (up to d elements of a set of N elements).
As you've already stated, there are two basic principles via which the testing may be carried out:
Non-adaptive Group Testing, where all the tests to be performed are decided a priori.
Adaptive Group Testing, where we perform several tests, basing each test on the outcome of previous tests. Obviously, adaptive testing has a potential to reduce the cost, compared to non-adaptive testing.
Theoretical bounds for both principles have been studied, and are available in this Wiki article, or this paper.
For adaptive testing, the upper bound is O(d*log(N)) (as already described in this answer).
For non-adaptive testing, it can be shown that the upper bound is O(d*d/log(d)*log(N)), which is obviously larger than the upper bound for adaptive testing by a factor of d/log(d).
This upper bound for non-adaptive testing comes from an algorithm which uses disjunct matrices: matrices of dimension T x N ("number of tests" x "number of elements"), where each item can be either true (if an element was included in a test), or false (if it wasn't), with a property that any subset of d columns must differ from all other columns by at least a single row (test inclusion). This allows linear time of decoding (there are also "d-separable" matrices where fewer test are needed, but the time complexity for their decoding is exponential and not computationaly feasible).
Conclusion:
My question is whether or not you can beat n calls on a fixed testing scheme [...]
For such a scheme and a sufficiently large value of N, a disjunct matrix can be constructed which would have less than K * [d*d/log(d)*log(N)] rows. So, for large values of N, yes, you can beat it.
The underlying question (challenge) is kind of silly. If the binary search answer is acceptable (where it sums sub arrays and sends them to IsEven) then I can think of a way to do it with E or less calls to IsEven (assuming the numbers are integers of course).
JavaScript to demonstrate
// sort the array by only the first bit of the number
A.sort(function(x,y) { return (x & 1) - (y & 1); });
// all of the evens will be at the beginning
for(var i=0; i < E && i < A.length; i++) {
if(IsEven(A[i]))
Print(A[i]);
else
break;
}
Not exactly a solution, but just few thoughts.
It is easy to see that if a solution exists for array length n that takes less than n tests, then for any array length m > n it is easy to see that there is always a solution with less than m tests. So, if you have a solution for n = 2 or 3 or 4, then the problem is solved.
You can split the array into pairs of numbers and for each pair: if the sum is odd, then exactly one of them is even, otherwise if one of the numbers is even, then both of them are even. This way for each pair it takes either one or two tests. Best case:n/2 tests, worse case:n tests, if even and odd numbers are chosen with equal probability, then: 3n/4 tests.
My hunch is there is no solution with less than n tests. Not sure how to prove it.
UPDATE: The second solution can be extended in the following way.
Check if the sum of two numbers is even. If odd, then exactly one of them is even. Otherwise label the set as "homogeneous set of size 2". Take two "homogenous set"s of same size n. Pick one number from each set and check if their sum is even. If it is even, combine these two sets to a "homogeneous set of size 2n". Otherwise, it implies that one of those sets purely consists of even numbers and the other one purely odd numbers.
Best case:n/2 tests. Average case: 3*n/2. Worst case is still n. Worst case exists only when all the numbers are even or all the numbers are odd.
If we can add and multiply array elements, then we can compute every Boolean function (up to complementation) on the low-order bits. Simulate a circuit that encodes the positions of the even numbers as a number from 0 to nC0 + nC1 + ... + nCe - 1 represented in binary and use calls to isEven to read off the bits.
Number of calls used: within 1 of the information-theoretic optimum.
See also fully homomorphic encryption.

Bijection on the integers below x

i'm working on image processing, and i'm writing a parallel algorithm that iterates over all the pixels in an image, and changes the surrounding pixels based on it's value. In this algorithm, minor non-deterministic is acceptable, but i'd rather minimize it by only querying distant pixels simultaneously. Could someone give me an algorithm that bijectively maps the integers below n to the integers below n, in a fast and simple manner, such that two integers that are close to each other before mapping are likely to be far apart after application.
For simplicity let's say n is a power of two. Could you simply reverse the order of the least significant log2(n) bits of the number?
Considering the pixels to be a one dimentional array you could use a hash function j = i*p % n where n is the zero based index of the last pixel and p is a prime number chosen to place the pixel far enough away at each step. % is the remainder operator in C, mathematically I'd write j(i) = i p (mod n).
So if you want to jump at least 10 rows at each iteration, choose p > 10 * w where w is the screen width. You'll want to have a lookup table for p as a function of n and w of course.
Note that j hits every pixel as i goes from 0 to n.
CORRECTION: Use (mod (n + 1)), not (mod n). The last index is n, which cannot be reached using mod n since n (mod n) == 0.
Apart from reverting the bit order, you can use modulo. Say N is a prime number (like 521), so for all x = 0..520 you define a function:
f(x) = x * fac mod N
which is bijection on 0..520. fac is arbitrary number different from 0 and 1. For example for N = 521 and fac = 122 you get the following mapping:
which as you can see is quite uniform and not many numbers are near the diagonal - there are some, but it is a small proportion.

Why increase pointer by two while finding loop in linked list, why not 3,4,5?

I had a look at question already which talk about algorithm to find loop in a linked list. I have read Floyd's cycle-finding algorithm solution, mentioned at lot of places that we have to take two pointers. One pointer( slower/tortoise ) is increased by one and other pointer( faster/hare ) is increased by 2. When they are equal we find the loop and if faster pointer reaches null there is no loop in the linked list.
Now my question is why we increase faster pointer by 2. Why not something else? Increasing by 2 is necessary or we can increase it by X to get the result. Is it necessary that we will find a loop if we increment faster pointer by 2 or there can be the case where we need to increment by 3 or 5 or x.
From a correctness perspective, there is no reason that you need to use the number two. Any choice of step size will work (except for one, of course). However, choosing a step of size two maximizes efficiency.
To see this, let's take a look at why Floyd's algorithm works in the first place. The idea is to think about the sequence x0, x1, x2, ..., xn, ... of the elements of the linked list that you'll visit if you start at the beginning of the list and then keep on walking down it until you reach the end. If the list does not contain a cycle, then all these values are distinct. If it does contain a cycle, though, then this sequence will repeat endlessly.
Here's the theorem that makes Floyd's algorithm work:
The linked list contains a cycle if and only if there is a positive integer j such that for any positive integer k, xj = xjk.
Let's go prove this; it's not that hard. For the "if" case, if such a j exists, pick k = 2. Then we have that for some positive j, xj = x2j and j ≠ 2j, and so the list contains a cycle.
For the other direction, assume that the list contains a cycle of length l starting at position s. Let j be the smallest multiple of l greater than s. Then for any k, if we consider xj and xjk, since j is a multiple of the loop length, we can think of xjk as the element formed by starting at position j in the list, then taking j steps k-1 times. But each of these times you take j steps, you end up right back where you started in the list because j is a multiple of the loop length. Consequently, xj = xjk.
This proof guarantees you that if you take any constant number of steps on each iteration, you will indeed hit the slow pointer. More precisely, if you're taking k steps on each iteration, then you will eventually find the points xj and xkj and will detect the cycle. Intuitively, people tend to pick k = 2 to minimize the runtime, since you take the fewest number of steps on each iteration.
We can analyze the runtime more formally as follows. If the list does not contain a cycle, then the fast pointer will hit the end of the list after n steps for O(n) time, where n is the number of elements in the list. Otherwise, the two pointers will meet after the slow pointer has taken j steps. Remember that j is the smallest multiple of l greater than s. If s ≤ l, then j = l; otherwise if s > l, then j will be at most 2s, and so the value of j is O(s + l). Since l and s can be no greater than the number of elements in the list, this means than j = O(n). However, after the slow pointer has taken j steps, the fast pointer will have taken k steps for each of the j steps taken by the slower pointer so it will have taken O(kj) steps. Since j = O(n), the net runtime is at most O(nk). Notice that this says that the more steps we take with the fast pointer, the longer the algorithm takes to finish (though only proportionally so). Picking k = 2 thus minimizes the overall runtime of the algorithm.
Hope this helps!
Let us suppose the length of the list which does not contain the loop be s, length of the loop be t and the ratio of fast_pointer_speed to slow_pointer_speed be k.
Let the two pointers meet at a distance j from the start of the loop.
So, the distance slow pointer travels = s + j. Distance the fast pointer travels = s + j + m * t (where m is the number of times the fast pointer has completed the loop). But, the fast pointer would also have traveled a distance k * (s + j) (k times the distance of the slow pointer).
Therefore, we get k * (s + j) = s + j + m * t.
s + j = (m / k-1)t.
Hence, from the above equation, length the slow pointer travels is an integer multiple of the loop length.
For greatest efficiency , (m / k-1) = 1 (the slow pointer shouldn't have traveled the loop more than once.)
therefore , m = k - 1 => k = m + 1
Since m is the no.of times the fast pointer has completed the loop , m >= 1 .
For greatest efficiency , m = 1.
therefore k = 2.
if we take a value of k > 2 , more the distance the two pointers would have to travel.
Hope the above explanation helps.
Consider a cycle of size L, meaning at the kth element is where the loop is: xk -> xk+1 -> ... -> xk+L-1 -> xk. Suppose one pointer is run at rate r1=1 and the other at r2. When the first pointer reaches xk the second pointer will already be in the loop at some element xk+s where 0 <= s < L. After m further pointer increments the first pointer is at xk+(m mod L) and the second pointer is at xk+((m*r2+s) mod L). Therefore the condition that the two pointers collide can be phrased as the existence of an m satisfying the congruence
m = m*r2 + s (mod L)
This can be simplified with the following steps
m(1-r2) = s (mod L)
m(L+1-r2) = s (mod L)
This is of the form of a linear congruence. It has a solution m if s is divisible by gcd(L+1-r2,L). This will certainly be the case if gcd(L+1-r2,L)=1. If r2=2 then gcd(L+1-r2,L)=gcd(L-1,L)=1 and a solution m always exists.
Thus r2=2 has the good property that for any cycle size L, it satisfies gcd(L+1-r2,L)=1 and thus guarantees that the pointers will eventually collide even if the two pointers start at different locations. Other values of r2 do not have this property.
If the fast pointer moves 3 steps and slow pointer at 1 step, it is not guaranteed for both pointers to meet in cycles containing even number of nodes. If the slow pointer moved at 2 steps, however, the meeting would be guaranteed.
In general, if the hare moves at H steps, and tortoise moves at T steps, you are guaranteed to meet in a cycle iff H = T + 1.
Consider the hare moving relative to the tortoise.
Hare's speed relative to the tortoise is H - T nodes per iteration.
Given a cycle of length N =(H - T) * k, where k is any positive
integer, the hare would skip every H - T - 1 nodes (again, relative
to the tortoise), and it would be impossible to for them to meet if
the tortoise was in any of those nodes.
The only possibility where a meeting is guaranteed is when H - T - 1 = 0.
Hence, increasing the fast pointer by x is allowed, as long as the slow pointer is increased by x - 1.
Here is a intuitive non-mathematical way to understand this:
If the fast pointer runs off the end of the list obviously there is no cycle.
Ignore the initial part where the pointers are in the initial non-cycle part of the list, we just need to get them into the cycle. It doesn't matter where in the cycle the fast pointer is when the slow pointer finally reaches the cycle.
Once they are both in the cycle, they are circling the cycle but at different points. Imagine if they were both moving by one each time. Then they would be circling the cycle but staying the same distance apart. In other words, making the same loop but out of phase. Now by moving the fast pointer by two each step they are changing their phase with each other; Decreasing their distance apart by one each step. The fast pointer will catch up to the slow pointer and we can detect the loop.
To prove this is true, that they will meet each other and the fast pointer will not somehow overtake and skip over the slow pointer just hand simulate what happens when the fast pointer is three steps behind the slow, then simulate what happens when the fast pointer is two steps behind the slow, then when the fast pointer is just one step behind the slow pointer. In every case they meet at the same node. Any larger distance will eventually become a distance of three, two or one.
If there is a loop (of n nodes), then once a pointer has entered the loop it will remain there forever; so we can move forward in time until both pointers are in the loop. From here on the pointers can be represented by integers modulo n with initial values a and b. The condition for them to meet after t steps is then
a+t≡b+2t mod n
which has solution t=a−b mod n.
This will work so long as the difference between the speeds shares no prime factors with n.
Reference
https://math.stackexchange.com/questions/412876/proof-of-the-2-pointer-method-for-finding-a-linked-list-loop
The single restriction on speeds is that their difference should be co-prime with the loop's length.
Theoretically, consider the cycle(loop) as a park(circular, rectangle whatever), First person X is moving slow and Second person Y is moving faster than X. Now, it doesn't matter if person Y is moving with speed of 2 times that of X or 3,4,5... times. There will always be a case when they meet at one point.
Say we use two references Rp and Rq which take p and q steps in each iteration; p > q. In the Floyd's algorithm, p = 2, q = 1.
We know that after certain iterations, both Rp and Rq will be at some elements of the loop. Then, say Rp is ahead of Rq by x steps. That is, starting at the element of Rq, we can take x steps to reach the element of Rp.
Say, the loop has n elements. After t further iterations, Rp will be ahead of Rq by (x + (p-q)*t) steps. So, they can meet after t iterations only if:
n divides (x + (p-q)*t)
Which can be written as:
(p−q)*t ≡ (−x) (mod n)
Due to modular arithmetic, this is possible only if: GCD(p−q, n) | x.
But we do not know x. Though, if the GCD is 1, it will divide any x. To make the GCD as 1:
if n is not known, choose any p and q such that (p-q) = 1. Floyd's algorithm does have p-q = 2-1 = 1.
if n is known, choose any p and q such that (p-q) is coprime with n.
Update: On some further analysis later, I realized that any unequal positive integers p and q will make the two references meet after some iterations. Though, the values of 1 and 2 seem to require less number of total stepping.
The reason why 2 is chosen is because lets say
slow pointer moves at 1
fast moves at 2
The loop has 5 elements.
Now for slow and fast pointer to meet ,
Lowest common multiple (LCM) of 1,2 and 5 must exist and thats where they meet. In this case its 10.
If you simulate slow and fast pointer you will see that the slow and fast pointer meet at 2 * elements in loop. When you do 2 loops , you meet at exactly same point of as starting point.
In case of non loop , it becomes LCM of 1,2 and infinity. so they never meet.
If the linked list has a loop then a fast pointer with increment of 2 will work better then say increment of 3 or 4 or more because it ensures that once we are inside the loop the pointers will surely collide and there will be no overtaking.
For example if we take increment of 3 and inside the loop lets assume
fast pointer --> i
slow --> i+1
the next iteration
fast pointer --> i+3
slow --> i+2
whereas such case will never happen with increment of 2.
Also if you are really unlucky then you may end up in a situation where loop length is L and you are incrementing the fast pointer by L+1. Then you will be stuck infinitely since the difference of the movement fast and slow pointer will always be L.
I hope I made myself clear.

Resources