Implementing Deque using 3 Stacks (Amortized time O(1))

Implementing Deque using 3 Stacks (Amortized time O(1)) - algorithm

I have this question for howmework:
Implement a Deque using 3 Stacks. The Deque have those operations : InsertHead, InsertTail, DeleteHead,DeleteTail. Prove that the amortized time for each operation is O(1).
What I've tried is to look at the problem as Hanoi problem.
So lets call the Stacks as: L(left), M (middle), R(right).
Pseudo-code Implementations:
InsertHead(e):
L.push(e);
DeleteHead(e):
L is empty:
while R is not empty:
pop and insert the element to M;
pop M;
while M is not empty:
pop and insert the element to R;
L is not empty:
L.pop(e);
InsertTail and DeleteTail are on the same principle of the above implementations.
How can I prove that the amortized time is O(1)?
because there can be N elements in L and the wile loop will take O(n), now if I'll call the deleteHead N times to calculate an amortized time the complexity will not be O(n^2)?
can someone help me how can I prove that the above implementations take O(1) in amortized time?

We proceed using the potential method; define
Phi = C |L.size - R.size|
For some constant C for which we will pick a value later. Let Phi_t denote the potential after t operations. Note that in a "balanced" state where both stacks have equal size, the data structure has potential 0.
The potential at any time is a constant times the difference in the number of elements in each stack. Note that Phi_0 = 0 so the potential is zero when the structure is initialised.
It is clear that a push operation increases the potential by at most C. A pop operation which does not miss (i.e. where the relevant stack is nonempty) also alters the potential by at most C. Both of these operations have true cost 1, hence they have amortised cost 1 + C.
When a pop operation occurs and causes a miss (when the stack we want to pop from is empty), the true cost of the operation is 1 + 3/2 * R.size for when we are trying to pop from L, and vice versa for when we are popping from R. This is because we move half of R's elements to M and back, and the other half of R's elements to L. The +1 is needed because of the final pop from L after this rebalancing operation is done.
Hence if we take C := 3/2, then the pop operation when a miss occurs has amortised cost 1, because the potential reduces from 3/2 * R.size to 0 due to the rebalancing, and then we may incur an additional cost of 3/2 from the pop which occurs after the rebalance.
In other words, each operation has an amortised cost bounded by a constant.
Finally, because the initial potential is 0, and the potential is always nonnegative, each operation is amortised cost O(1), as required.

Related

Runtime complexity of Floyd's cycle detection

According to some online sources I referred the runtime complexity of Floyd's cycle detection algo is O(n).
Say,
p = slow pointer
q = fast pointer
m = the distance from start of linked list to first loop node
k = the distance of the meeting point of fast and slow nodes from the first loop node
l = length of loop
rp = number of loop rotations by p before meeting q.
rq = number of loop rotations by q before meeting p.
Run time complexity should be = m+rp*l+k
How does this value be O(n)?

If the list has N nodes, then in <= N steps, either the fast pointer will find the end of the list, or there is a loop and the slow pointer will be in the loop.
Lets say the loop is of length M <= N: Once the slow pointer is in the loop, both the fast and slow pointers will be stuck in the loop forever. Each step, the distance between the fast and the slow pointers will increase by 1. When the distance is divisible by M, then the fast and slow pointers will be on the same node and the algorithm terminates. The distance will reach a number divisible by M in <= M steps.
So, getting the slow pointer to the loop, and then getting the fast and slow pointers to meet takes <= N + M <= 2N steps, and that is in O(N)
In fact, you can get a tighter bound on the step count if you note that when there's a loop, the slow pointer will get to it in N - M steps, so the total step count is <= N.

If there are n nodes, the slow pointer is guaranteed to travel no more than n steps before the fast pointer either meets the slow pointer or finds an end to the list. That means you do O(n) work advancing the slow pointer, and you advance the fast pointer about twice that, which is also O(n). Thus, the whole algorithm is O(n).

Big-O for While loop

I have this algorithm, and I am trying to calculate its complexity.
A = {a_1, a_2, a_3, ...}
w = 0
while A != empty
a' = argmin(A) #a' is the element with smallest y_a
if (N_a' + w > C)
A = A - {a'}
else
x_a' = x_a' + 1
w = w + N_a'
Update the y_a' value in A using x_a'
A is a set, and if the condition (N_a' + w > C) is true, we remove an element from the set until the set is empty. I know that the algorithm runs at least O(n), but it can run more if the if statement is false. Assume the last line (the update) takes a constant time.
How can I calculate the complexity here?

Let's first determine how often the then and else branches can run in the worst case. In the then branch the set A becomes smaller by one element so it can only be executed n times (where n is the initial number of elements in A). The else branch can be executed at most C times (take N_a' = 1, it must be >= 1). C is a constant so this is O(1). The total number of iterations is therefore O(n).
The critical point is now the data structure used for A. Three operations have to be supported: finding the min, removing the minimum element, and the update in the last line. When we choose a min-heap, each of these operations can be done in O(log n). Note that the update is not O(1) time in this case. The total run time is now O(n log n).
A naive minimum search (i.e. using an uordered array for A) makes the operations min, remove element, and update O(n), O(1), and O(1) respectively. The total run time would therefore be O(n*n).
Using an ordered array to represent A we get run times of O(1), O(1), and O(n) respectively for our three operations. The min search O(1) operation is executed for each iteration, so O(n) times. The remove elemnt O(1) operation is in the then branch, so executed O(n) times, the update O(n) operation is in the else branch, so executed O(1) times. Taking all together gives a runtime of O(n).
However, if the set has to be sorted in the beginning we are again at O(n log n).

In a looping linked list, what guarantee is there that the fast and slow runners will collide? [duplicate]

I had a look at question already which talk about algorithm to find loop in a linked list. I have read Floyd's cycle-finding algorithm solution, mentioned at lot of places that we have to take two pointers. One pointer( slower/tortoise ) is increased by one and other pointer( faster/hare ) is increased by 2. When they are equal we find the loop and if faster pointer reaches null there is no loop in the linked list.
Now my question is why we increase faster pointer by 2. Why not something else? Increasing by 2 is necessary or we can increase it by X to get the result. Is it necessary that we will find a loop if we increment faster pointer by 2 or there can be the case where we need to increment by 3 or 5 or x.

From a correctness perspective, there is no reason that you need to use the number two. Any choice of step size will work (except for one, of course). However, choosing a step of size two maximizes efficiency.
To see this, let's take a look at why Floyd's algorithm works in the first place. The idea is to think about the sequence x0, x1, x2, ..., xn, ... of the elements of the linked list that you'll visit if you start at the beginning of the list and then keep on walking down it until you reach the end. If the list does not contain a cycle, then all these values are distinct. If it does contain a cycle, though, then this sequence will repeat endlessly.
Here's the theorem that makes Floyd's algorithm work:
The linked list contains a cycle if and only if there is a positive integer j such that for any positive integer k, xj = xjk.
Let's go prove this; it's not that hard. For the "if" case, if such a j exists, pick k = 2. Then we have that for some positive j, xj = x2j and j ≠ 2j, and so the list contains a cycle.
For the other direction, assume that the list contains a cycle of length l starting at position s. Let j be the smallest multiple of l greater than s. Then for any k, if we consider xj and xjk, since j is a multiple of the loop length, we can think of xjk as the element formed by starting at position j in the list, then taking j steps k-1 times. But each of these times you take j steps, you end up right back where you started in the list because j is a multiple of the loop length. Consequently, xj = xjk.
This proof guarantees you that if you take any constant number of steps on each iteration, you will indeed hit the slow pointer. More precisely, if you're taking k steps on each iteration, then you will eventually find the points xj and xkj and will detect the cycle. Intuitively, people tend to pick k = 2 to minimize the runtime, since you take the fewest number of steps on each iteration.
We can analyze the runtime more formally as follows. If the list does not contain a cycle, then the fast pointer will hit the end of the list after n steps for O(n) time, where n is the number of elements in the list. Otherwise, the two pointers will meet after the slow pointer has taken j steps. Remember that j is the smallest multiple of l greater than s. If s ≤ l, then j = l; otherwise if s > l, then j will be at most 2s, and so the value of j is O(s + l). Since l and s can be no greater than the number of elements in the list, this means than j = O(n). However, after the slow pointer has taken j steps, the fast pointer will have taken k steps for each of the j steps taken by the slower pointer so it will have taken O(kj) steps. Since j = O(n), the net runtime is at most O(nk). Notice that this says that the more steps we take with the fast pointer, the longer the algorithm takes to finish (though only proportionally so). Picking k = 2 thus minimizes the overall runtime of the algorithm.
Hope this helps!

Let us suppose the length of the list which does not contain the loop be s, length of the loop be t and the ratio of fast_pointer_speed to slow_pointer_speed be k.
Let the two pointers meet at a distance j from the start of the loop.
So, the distance slow pointer travels = s + j. Distance the fast pointer travels = s + j + m * t (where m is the number of times the fast pointer has completed the loop). But, the fast pointer would also have traveled a distance k * (s + j) (k times the distance of the slow pointer).
Therefore, we get k * (s + j) = s + j + m * t.
s + j = (m / k-1)t.
Hence, from the above equation, length the slow pointer travels is an integer multiple of the loop length.
For greatest efficiency , (m / k-1) = 1 (the slow pointer shouldn't have traveled the loop more than once.)
therefore , m = k - 1 => k = m + 1
Since m is the no.of times the fast pointer has completed the loop , m >= 1 .
For greatest efficiency , m = 1.
therefore k = 2.
if we take a value of k > 2 , more the distance the two pointers would have to travel.
Hope the above explanation helps.

Consider a cycle of size L, meaning at the kth element is where the loop is: xk -> xk+1 -> ... -> xk+L-1 -> xk. Suppose one pointer is run at rate r1=1 and the other at r2. When the first pointer reaches xk the second pointer will already be in the loop at some element xk+s where 0 <= s < L. After m further pointer increments the first pointer is at xk+(m mod L) and the second pointer is at xk+((m*r2+s) mod L). Therefore the condition that the two pointers collide can be phrased as the existence of an m satisfying the congruence
m = m*r2 + s (mod L)
This can be simplified with the following steps
m(1-r2) = s (mod L)
m(L+1-r2) = s (mod L)
This is of the form of a linear congruence. It has a solution m if s is divisible by gcd(L+1-r2,L). This will certainly be the case if gcd(L+1-r2,L)=1. If r2=2 then gcd(L+1-r2,L)=gcd(L-1,L)=1 and a solution m always exists.
Thus r2=2 has the good property that for any cycle size L, it satisfies gcd(L+1-r2,L)=1 and thus guarantees that the pointers will eventually collide even if the two pointers start at different locations. Other values of r2 do not have this property.

If the fast pointer moves 3 steps and slow pointer at 1 step, it is not guaranteed for both pointers to meet in cycles containing even number of nodes. If the slow pointer moved at 2 steps, however, the meeting would be guaranteed.
In general, if the hare moves at H steps, and tortoise moves at T steps, you are guaranteed to meet in a cycle iff H = T + 1.
Consider the hare moving relative to the tortoise.
Hare's speed relative to the tortoise is H - T nodes per iteration.
Given a cycle of length N =(H - T) * k, where k is any positive
integer, the hare would skip every H - T - 1 nodes (again, relative
to the tortoise), and it would be impossible to for them to meet if
the tortoise was in any of those nodes.
The only possibility where a meeting is guaranteed is when H - T - 1 = 0.
Hence, increasing the fast pointer by x is allowed, as long as the slow pointer is increased by x - 1.

Here is a intuitive non-mathematical way to understand this:
If the fast pointer runs off the end of the list obviously there is no cycle.
Ignore the initial part where the pointers are in the initial non-cycle part of the list, we just need to get them into the cycle. It doesn't matter where in the cycle the fast pointer is when the slow pointer finally reaches the cycle.
Once they are both in the cycle, they are circling the cycle but at different points. Imagine if they were both moving by one each time. Then they would be circling the cycle but staying the same distance apart. In other words, making the same loop but out of phase. Now by moving the fast pointer by two each step they are changing their phase with each other; Decreasing their distance apart by one each step. The fast pointer will catch up to the slow pointer and we can detect the loop.
To prove this is true, that they will meet each other and the fast pointer will not somehow overtake and skip over the slow pointer just hand simulate what happens when the fast pointer is three steps behind the slow, then simulate what happens when the fast pointer is two steps behind the slow, then when the fast pointer is just one step behind the slow pointer. In every case they meet at the same node. Any larger distance will eventually become a distance of three, two or one.

If there is a loop (of n nodes), then once a pointer has entered the loop it will remain there forever; so we can move forward in time until both pointers are in the loop. From here on the pointers can be represented by integers modulo n with initial values a and b. The condition for them to meet after t steps is then
a+t≡b+2t mod n
which has solution t=a−b mod n.
This will work so long as the difference between the speeds shares no prime factors with n.
Reference
https://math.stackexchange.com/questions/412876/proof-of-the-2-pointer-method-for-finding-a-linked-list-loop
The single restriction on speeds is that their difference should be co-prime with the loop's length.

Theoretically, consider the cycle(loop) as a park(circular, rectangle whatever), First person X is moving slow and Second person Y is moving faster than X. Now, it doesn't matter if person Y is moving with speed of 2 times that of X or 3,4,5... times. There will always be a case when they meet at one point.

Say we use two references Rp and Rq which take p and q steps in each iteration; p > q. In the Floyd's algorithm, p = 2, q = 1.
We know that after certain iterations, both Rp and Rq will be at some elements of the loop. Then, say Rp is ahead of Rq by x steps. That is, starting at the element of Rq, we can take x steps to reach the element of Rp.
Say, the loop has n elements. After t further iterations, Rp will be ahead of Rq by (x + (p-q)*t) steps. So, they can meet after t iterations only if:
n divides (x + (p-q)*t)
Which can be written as:
(p−q)*t ≡ (−x) (mod n)
Due to modular arithmetic, this is possible only if: GCD(p−q, n) | x.
But we do not know x. Though, if the GCD is 1, it will divide any x. To make the GCD as 1:
if n is not known, choose any p and q such that (p-q) = 1. Floyd's algorithm does have p-q = 2-1 = 1.
if n is known, choose any p and q such that (p-q) is coprime with n.
Update: On some further analysis later, I realized that any unequal positive integers p and q will make the two references meet after some iterations. Though, the values of 1 and 2 seem to require less number of total stepping.

The reason why 2 is chosen is because lets say
slow pointer moves at 1
fast moves at 2
The loop has 5 elements.
Now for slow and fast pointer to meet ,
Lowest common multiple (LCM) of 1,2 and 5 must exist and thats where they meet. In this case its 10.
If you simulate slow and fast pointer you will see that the slow and fast pointer meet at 2 * elements in loop. When you do 2 loops , you meet at exactly same point of as starting point.
In case of non loop , it becomes LCM of 1,2 and infinity. so they never meet.

If the linked list has a loop then a fast pointer with increment of 2 will work better then say increment of 3 or 4 or more because it ensures that once we are inside the loop the pointers will surely collide and there will be no overtaking.
For example if we take increment of 3 and inside the loop lets assume
fast pointer --> i
slow --> i+1
the next iteration
fast pointer --> i+3
slow --> i+2
whereas such case will never happen with increment of 2.
Also if you are really unlucky then you may end up in a situation where loop length is L and you are incrementing the fast pointer by L+1. Then you will be stuck infinitely since the difference of the movement fast and slow pointer will always be L.
I hope I made myself clear.

Heap-sort time complexity deep understanding

When I studied the Data Structures course in the university, I learned the following axioms:
Insertion of a new number to the heap takes O(logn) in worst case (depending on how high in the tree it reaches when inserted as a leaf)
Building a heap of n nodes, using n insertions, starting from an empty heap, is summed to O(n) time, using amortized analysis
Removal of the minimum takes O(logn) time in worst case (depending on how low the new top node reaches, after it was swapped with the last leaf)
Removal of all the minimums one by one, until the heap is empty, takes O(nlogn) time complexity
Reminder: The steps of "heapsort" algorithm are:
Add all the array values to a heap: summed to O(n) time complexity using the amortized-analysis trick
Pop the minimum out of the heap n times and place the i-th value in the i-th index of the array : O(nlogn) time complexity, as the amortized-analysis trick does not work when popping the minimum
My question is: Why the amortized-analysis trick does not work when emptying the heap, causing heap-sort algorithm to take O(nlogn) time and not O(n) time?
Edit/Answer
When the heap is stored in an array (rather than dynamic tree nodes with pointers), then we can build the heap bottom up, i.e., starting from the leaves and up to the root, then using amortized-analysis we can get total time complexity of O(n), whereas we cannot empty the heap minima's bottom up.

Assuming you're only allowed to learn about the relative ranking of two objects by comparing them, then there's no way to dequeue all elements from a binary heap in time O(n). If you could do this, then you could sort a list in time O(n) by building a heap in time O(n) and then dequeuing everything in time O(n). However, the sorting lower bound says that comparison sorts, in order to be correct, must have a runtime of Ω(n log n) on average. In other words, you can't dequeue from a heap too quickly or you'd break the sorting barrier.
There's also the question about why dequeuing n elements from a binary heap takes time O(n log n) and not something faster. This is a bit tricky to show, but here's the basic idea. Consider the first half of the dequeues you make on the heap. Look at the values that actually got dequeued and think about where they were in the heap to begin with. Excluding the ones on the bottom row, everything else that was dequeued had to percolate up to the top of the heap one swap at a time in order to be removed. You can show that there are enough elements in the heap to guarantee that this alone takes time Ω(n log n) because roughly half of those nodes will be deep in the tree. This explains why the amortized argument doesn't work - you're constantly pulling deep nodes up the heap, so the total distance the nodes have to travel is large. Compare this to the heapify operation, where most nodes travel very little distance.

Let me show you "mathematically" how we can compute the complexity of transforming an arbitrary array into an heap (let me call this "heap build") and then sorting it with heapsort.
Heap build time analysis
In order to transform the array into an heap, we have to look at each node with children and "heapify" (sink) that node. You should ask yourself how many compares we perform; if you think about it, you see that (h = tree height):
For each node at level i, we make h-i compares: #comparesOneNode(i) = h-i
At level i, we have 2^i nodes: #nodes(i) = 2^i
So, generally T(n,i) = #nodes(i) * #comparesOneNode(i) = 2^i *(h-i), is the time spent for "compares" at level "i"
Let's make an example. Suppose to have an array of 15 elements, i.e., the height of the tree would be h = log2(15) = 3:
At level i=3, we have 2^3=8 nodes and we make 3-3 compares for each node: correct, since at level 3 we have only nodes without children, i.e., leaves. T(n, 3) = 2^3*(3-3) = 0
At level i=2, we have 2^2=4 nodes and we make 3-2 compares for each node: correct, since at level 2 we have only level 3 with which we can compare. T(n, 2) = 2^2*(3-2) = 4 * 1
At level i=1, we have 2^1=2 nodes and we make 3-1 compares for each node: T(n, 1) = 2^1*(3-1) = 2 * 2
At level i=0, we have 2^0=1 node, the root, and we make 3-0 compares: T(n, 0) = 2^0*(3-0) = 1 * 3
Ok, generally:
T(n) = sum(i=0 to h) 2^i * (h-i)
but if you remember that h = log2(n), we have
T(n) = sum(i=0 to log2(n)) 2^i * (log2(n) - i) =~ 2n
Heapsort time analysis
Now, here the analysis is really similar. Every time we "remove" the max element (root), we move to root the last leaf in the tree, heapify it and repeat till the end. So, how many compares do we perform here?
At level i, we have 2^i nodes: #nodes(i) = 2^i
For each node at level "i", heapify, in the worst case, will always do the same number of compares that is exactly equal to the level "i" (we take one node from level i, move it to root, call heapify, and heapify in worst case will bring back the node to level i, performing"i" compares): #comparesOneNode(i) = i
So, generally T(n,i) = #nodes(i) * #comparesOneNode(i) = 2^i*i, is the time spent for removing the first 2^i roots and bring back to the correct position the temporary roots.
Let's make an example. Suppose to have an array of 15 elements, i.e., the height of the tree would be h = log2(15) = 3:
At level i=3, we have 2^3=8 nodes and we need to move each one of them to the root place and then heapify each of them. Each heapify will perform in worst case "i" compares, because the root could sink down to the still existent level "i". T(n, 3) = 2^3 * 3 = 8*3
At level i=2, we have 2^2=4 nodes and we make 2 compares for each node: T(n, 2) = 2^2*2 = 4 * 2
At level i=1, we have 2^1=2 nodes and we make 1 compare for each node: T(n, 1) = 2^1*1 = 2 * 1
At level i=0, we have 2^0=1 node, the root, and we make 0 compares: T(n, 0) = 0
Ok, generally:
T(n) = sum(i=0 to h) 2^i * i
but if you remember that h = log2(n), we have
T(n) = sum(i=0 to log2(n)) 2^i * i =~ 2nlogn
Heap build VS heapsort
Intuitively, you can see that heapsort is not able to "amortise" his cost because every time we increase the number of nodes, more compares we have to do, while we have exactly the opposite in the heap build functionality! You can see here:
Heap build: T(n, i) ~ 2^i * (h-i), if i increases, #nodes increases, but #compares decreases
Heapsort: T(n, i) ~ 2^i * i, if i increases, #nodes increases and #compares increases
So:
Level i=3, #nodes(3)=8, Heap build does 0 compares, heapsort does 8*3 = 24 compares
Level i=2, #nodes(2)=4, Heap build does 4 compares, heapsort does 4*2 = 8 compares
Level i=1, #nodes(1)=2, Heap build does 4 compares, heapsort does 2*1 = 2 compares
Level i=0, #nodes(0)=1, Heap build does 3 compares, heapsort does 1*0 = 1 compares

Why increase pointer by two while finding loop in linked list, why not 3,4,5?

I had a look at question already which talk about algorithm to find loop in a linked list. I have read Floyd's cycle-finding algorithm solution, mentioned at lot of places that we have to take two pointers. One pointer( slower/tortoise ) is increased by one and other pointer( faster/hare ) is increased by 2. When they are equal we find the loop and if faster pointer reaches null there is no loop in the linked list.
Now my question is why we increase faster pointer by 2. Why not something else? Increasing by 2 is necessary or we can increase it by X to get the result. Is it necessary that we will find a loop if we increment faster pointer by 2 or there can be the case where we need to increment by 3 or 5 or x.

From a correctness perspective, there is no reason that you need to use the number two. Any choice of step size will work (except for one, of course). However, choosing a step of size two maximizes efficiency.
To see this, let's take a look at why Floyd's algorithm works in the first place. The idea is to think about the sequence x0, x1, x2, ..., xn, ... of the elements of the linked list that you'll visit if you start at the beginning of the list and then keep on walking down it until you reach the end. If the list does not contain a cycle, then all these values are distinct. If it does contain a cycle, though, then this sequence will repeat endlessly.
Here's the theorem that makes Floyd's algorithm work:
The linked list contains a cycle if and only if there is a positive integer j such that for any positive integer k, xj = xjk.
Let's go prove this; it's not that hard. For the "if" case, if such a j exists, pick k = 2. Then we have that for some positive j, xj = x2j and j ≠ 2j, and so the list contains a cycle.
For the other direction, assume that the list contains a cycle of length l starting at position s. Let j be the smallest multiple of l greater than s. Then for any k, if we consider xj and xjk, since j is a multiple of the loop length, we can think of xjk as the element formed by starting at position j in the list, then taking j steps k-1 times. But each of these times you take j steps, you end up right back where you started in the list because j is a multiple of the loop length. Consequently, xj = xjk.
This proof guarantees you that if you take any constant number of steps on each iteration, you will indeed hit the slow pointer. More precisely, if you're taking k steps on each iteration, then you will eventually find the points xj and xkj and will detect the cycle. Intuitively, people tend to pick k = 2 to minimize the runtime, since you take the fewest number of steps on each iteration.
We can analyze the runtime more formally as follows. If the list does not contain a cycle, then the fast pointer will hit the end of the list after n steps for O(n) time, where n is the number of elements in the list. Otherwise, the two pointers will meet after the slow pointer has taken j steps. Remember that j is the smallest multiple of l greater than s. If s ≤ l, then j = l; otherwise if s > l, then j will be at most 2s, and so the value of j is O(s + l). Since l and s can be no greater than the number of elements in the list, this means than j = O(n). However, after the slow pointer has taken j steps, the fast pointer will have taken k steps for each of the j steps taken by the slower pointer so it will have taken O(kj) steps. Since j = O(n), the net runtime is at most O(nk). Notice that this says that the more steps we take with the fast pointer, the longer the algorithm takes to finish (though only proportionally so). Picking k = 2 thus minimizes the overall runtime of the algorithm.
Hope this helps!

Let us suppose the length of the list which does not contain the loop be s, length of the loop be t and the ratio of fast_pointer_speed to slow_pointer_speed be k.
Let the two pointers meet at a distance j from the start of the loop.
So, the distance slow pointer travels = s + j. Distance the fast pointer travels = s + j + m * t (where m is the number of times the fast pointer has completed the loop). But, the fast pointer would also have traveled a distance k * (s + j) (k times the distance of the slow pointer).
Therefore, we get k * (s + j) = s + j + m * t.
s + j = (m / k-1)t.
Hence, from the above equation, length the slow pointer travels is an integer multiple of the loop length.
For greatest efficiency , (m / k-1) = 1 (the slow pointer shouldn't have traveled the loop more than once.)
therefore , m = k - 1 => k = m + 1
Since m is the no.of times the fast pointer has completed the loop , m >= 1 .
For greatest efficiency , m = 1.
therefore k = 2.
if we take a value of k > 2 , more the distance the two pointers would have to travel.
Hope the above explanation helps.

Consider a cycle of size L, meaning at the kth element is where the loop is: xk -> xk+1 -> ... -> xk+L-1 -> xk. Suppose one pointer is run at rate r1=1 and the other at r2. When the first pointer reaches xk the second pointer will already be in the loop at some element xk+s where 0 <= s < L. After m further pointer increments the first pointer is at xk+(m mod L) and the second pointer is at xk+((m*r2+s) mod L). Therefore the condition that the two pointers collide can be phrased as the existence of an m satisfying the congruence
m = m*r2 + s (mod L)
This can be simplified with the following steps
m(1-r2) = s (mod L)
m(L+1-r2) = s (mod L)
This is of the form of a linear congruence. It has a solution m if s is divisible by gcd(L+1-r2,L). This will certainly be the case if gcd(L+1-r2,L)=1. If r2=2 then gcd(L+1-r2,L)=gcd(L-1,L)=1 and a solution m always exists.
Thus r2=2 has the good property that for any cycle size L, it satisfies gcd(L+1-r2,L)=1 and thus guarantees that the pointers will eventually collide even if the two pointers start at different locations. Other values of r2 do not have this property.

If the fast pointer moves 3 steps and slow pointer at 1 step, it is not guaranteed for both pointers to meet in cycles containing even number of nodes. If the slow pointer moved at 2 steps, however, the meeting would be guaranteed.
In general, if the hare moves at H steps, and tortoise moves at T steps, you are guaranteed to meet in a cycle iff H = T + 1.
Consider the hare moving relative to the tortoise.
Hare's speed relative to the tortoise is H - T nodes per iteration.
Given a cycle of length N =(H - T) * k, where k is any positive
integer, the hare would skip every H - T - 1 nodes (again, relative
to the tortoise), and it would be impossible to for them to meet if
the tortoise was in any of those nodes.
The only possibility where a meeting is guaranteed is when H - T - 1 = 0.
Hence, increasing the fast pointer by x is allowed, as long as the slow pointer is increased by x - 1.

Here is a intuitive non-mathematical way to understand this:
If the fast pointer runs off the end of the list obviously there is no cycle.
Ignore the initial part where the pointers are in the initial non-cycle part of the list, we just need to get them into the cycle. It doesn't matter where in the cycle the fast pointer is when the slow pointer finally reaches the cycle.
Once they are both in the cycle, they are circling the cycle but at different points. Imagine if they were both moving by one each time. Then they would be circling the cycle but staying the same distance apart. In other words, making the same loop but out of phase. Now by moving the fast pointer by two each step they are changing their phase with each other; Decreasing their distance apart by one each step. The fast pointer will catch up to the slow pointer and we can detect the loop.
To prove this is true, that they will meet each other and the fast pointer will not somehow overtake and skip over the slow pointer just hand simulate what happens when the fast pointer is three steps behind the slow, then simulate what happens when the fast pointer is two steps behind the slow, then when the fast pointer is just one step behind the slow pointer. In every case they meet at the same node. Any larger distance will eventually become a distance of three, two or one.

If there is a loop (of n nodes), then once a pointer has entered the loop it will remain there forever; so we can move forward in time until both pointers are in the loop. From here on the pointers can be represented by integers modulo n with initial values a and b. The condition for them to meet after t steps is then
a+t≡b+2t mod n
which has solution t=a−b mod n.
This will work so long as the difference between the speeds shares no prime factors with n.
Reference
https://math.stackexchange.com/questions/412876/proof-of-the-2-pointer-method-for-finding-a-linked-list-loop
The single restriction on speeds is that their difference should be co-prime with the loop's length.

Theoretically, consider the cycle(loop) as a park(circular, rectangle whatever), First person X is moving slow and Second person Y is moving faster than X. Now, it doesn't matter if person Y is moving with speed of 2 times that of X or 3,4,5... times. There will always be a case when they meet at one point.

Say we use two references Rp and Rq which take p and q steps in each iteration; p > q. In the Floyd's algorithm, p = 2, q = 1.
We know that after certain iterations, both Rp and Rq will be at some elements of the loop. Then, say Rp is ahead of Rq by x steps. That is, starting at the element of Rq, we can take x steps to reach the element of Rp.
Say, the loop has n elements. After t further iterations, Rp will be ahead of Rq by (x + (p-q)*t) steps. So, they can meet after t iterations only if:
n divides (x + (p-q)*t)
Which can be written as:
(p−q)*t ≡ (−x) (mod n)
Due to modular arithmetic, this is possible only if: GCD(p−q, n) | x.
But we do not know x. Though, if the GCD is 1, it will divide any x. To make the GCD as 1:
if n is not known, choose any p and q such that (p-q) = 1. Floyd's algorithm does have p-q = 2-1 = 1.
if n is known, choose any p and q such that (p-q) is coprime with n.
Update: On some further analysis later, I realized that any unequal positive integers p and q will make the two references meet after some iterations. Though, the values of 1 and 2 seem to require less number of total stepping.

The reason why 2 is chosen is because lets say
slow pointer moves at 1
fast moves at 2
The loop has 5 elements.
Now for slow and fast pointer to meet ,
Lowest common multiple (LCM) of 1,2 and 5 must exist and thats where they meet. In this case its 10.
If you simulate slow and fast pointer you will see that the slow and fast pointer meet at 2 * elements in loop. When you do 2 loops , you meet at exactly same point of as starting point.
In case of non loop , it becomes LCM of 1,2 and infinity. so they never meet.

If the linked list has a loop then a fast pointer with increment of 2 will work better then say increment of 3 or 4 or more because it ensures that once we are inside the loop the pointers will surely collide and there will be no overtaking.
For example if we take increment of 3 and inside the loop lets assume
fast pointer --> i
slow --> i+1
the next iteration
fast pointer --> i+3
slow --> i+2
whereas such case will never happen with increment of 2.
Also if you are really unlucky then you may end up in a situation where loop length is L and you are incrementing the fast pointer by L+1. Then you will be stuck infinitely since the difference of the movement fast and slow pointer will always be L.
I hope I made myself clear.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio