Cycle detection in a linked list : Exhaustive theory - algorithm

This is NOT the problem about detecting cycle in a linked list using the famous Hare and Tortoise method.
In the Hare & Tortoise method we have pointers running in 1x and 2x speeds to determine that they meet and I am convinced that its the most efficient way and the order of that type of search is O(n).
The problem is I have to come up with a proof (proving or disproving) that it is possible that two pointers will always meet when the moving speed is Ax (A times x) and Bx (B times x) and A is not equal to B. Where A an B are two random integers operating on a linked list with a cycle present.
This was asked in one of interviews I recently attended and I was not able to prove it comprehensively to myself that whether the above is possible. Any help appreciated.

Suppose there is a loop, say of length L.
Easy case first
To make it easier, first consider the case where the two particles entire loop at the same time. These particles are at the same position whenever n*A = n*B (mod L) for some positive integer n, which is the number of steps until they meet again. Taking n=L gives one solution (though there may be a smaller solution). So after L units of time, particle A has made A trips around the loop to be back at the beginning and particle B has made B trips around the loop to be back at the beginning, where they happily collide.
General Case
Now what happens when they do not enter the loop at the same time? Let A be the slower particle, i.e. A<B, and suppose A enters the loop at time m, and let's call the position at which A enters the loop 0 (since they're in the loop, they can never leave it, so I'm just renaming positions by subtracting A*m, the distance A has traveled after m time units). Then, at that time, B is already at position m*(B-A) (it's real position after m time units is B*m and it's renamed position is therefore B*m-A*m). Then we need to show that there is a time n such that n*A = n*B+m*(B-A) (mod L). That is, we need a solution to the modular equation
(n+m) * (A-B) = 0 (mod L)
Taking n = k*L-m for k large enough that k*L>m does the trick, though again, there may be a smaller solution.
Therefore, yes, they always meet.

If your two step-sizes have a common factor x: let's say the step sizes are Ax and Bx, then just consider the sequence you get from taking the original sequence and taking every x'th element. This new sequence has a cycle if and only if the original sequence does, and taking steps of size A and B on it is equivalent to taking steps of size Ax and Bx on the original sequence.
This reduction means that it's sufficient to prove that the algorithm works when A and B are coprime.

The hypothesis is false. For instance, if both pointers make leaps of an even size, the loop is also of even size, and distance between the pointers is odd, they will never meet.
UPD this is apparently an impossible situation. Because the two pointers start at the same point, the distance between them will always be even.

Related

Why does Floyd's cycle finding algorithm fail for certain pointer increment speeds?

Consider the following linked list:
1->2->3->4->5->6->7->8->9->4->...->9->4.....
The above list has a loop as follows:
[4->5->6->7->8->9->4]
Drawing the linked list on a whiteboard, I tried manually solving it for different pointer steps, to see how the pointers move around -
(slow_pointer_increment, fast_pointer_increment)
So, the pointers for different cases are as follows:
(1,2), (2,3), (1,3)
The first two pairs of increments - (1,2) and (2,3) worked fine, but when I use the pair (1,3), the algorithm does not seem to work on this pair. Is there a rule as to by how much we need to increment the steps for this algorithm to hold true?
Although I searched for various increment steps for the slower and the faster pointer, I haven't so far found a single relevant answer as to why it is not working for the increment (1,3) on this list.
The algorithm can easily be shown to be guaranteed to find a cycle starting from any position if the difference between the pointer increments and the cycle length are coprimes (i.e. their greatest common divisor must be 1).
For the general case, this means the difference between the increments must be 1 (because that's the only positive integer that's coprime to all other positive integers).
For any given pointer increments, if the values aren't coprimes, it may still be guaranteed to find a cycle, but one would need to come up with a different way to prove that it will find a cycle.
For the example in the question, with pointer increments of (1,3), the difference is 3-1=2, and the cycle length is 6. 2 and 6 are not coprimes, thus it's not known whether the algorithm is guaranteed to find the cycle in general. It does seem like this might actually be guaranteed to find the cycle (including for the example in the question), even though it doesn't reach every position (which applies with coprime, as explained below), but I don't have a proof for this at the moment.
The key to understanding this is that, at least for the purposes of checking whether the pointers ever meet, the slow and fast pointers' positions within the cycle only matters relative to each other. That is, these two can be considered equivalent: (the difference is 1 for both)
slow fast slow fast
↓ ↓ ↓ ↓
0→1→2→3→4→5→0 0→1→2→3→4→5→0
So we can think of this in terms of the position of slow remaining constant and fast moving at an increment of fastIncrement-slowIncrement, at which point the problem becomes:
Starting at any position, can we reach a specific position moving at some speed (mod cycle length)?
Or, more generally:
Can we reach every position moving at some speed (mod cycle length)?
Which will only be true if the speed and cycle length are coprimes.
For example, look at a speed of 4 and a cycle of length 6 - starting at 0, we visit:
0, 4, 8%6=2, 6%6=0, 4, 2, 0, ... - GCD(4,6) = 2, and we can only visit every second element.
To see this in action, consider increments of (1,5) (difference = 4) for the example given above and see that the pointers will never meet.
I should note that, to my knowledge at least, the (1,2) increment is considered a fundamental part of the algorithm.
Using different increments (as per the above constraints) might work, but it would be a move away from the "official" algorithm and would involve more work (since a pointer to a linked-list must be incremented iteratively, you can't increment it by more than 1 in a single step) without any clear advantage for the general case.
Bernhard Barker explanation is spot on.
I am simply adding on to it.
Why should the difference of speeds between the pointers and the cycle length be
coprime numbers?
Take a scenario where the difference of speeds between pointers(say v) and cycle length(say L) are not coprime.
So there exists a GCD(v,L) greater than 1 (say G).
Therefore, we have
v=difference of speeds between pointers
L=Length of the cycle(i.e. the number of nodes in the cycle)
G=GCD(v,L)
Since we are considering only relative positions, essentially the slow is stationary and the fast is moving at a relative speed v.
Let fast be at some node in the cycle.
Since G is a divisor of L we can divide the cycle into G/L parts. Start dividing from where fast is located.
Now, v is a multiple of G (say v=nG).
Every time the fast pointer moves it will jump across n parts. So in each part the pointer arrives on a single node(basically the last node of a part). Each and every time the fast pointer will land on the ending node of every part. Refer the image below
Example image
As mentioned above by Bernhard, the question we need to answer is
Can we reach every position moving at some speed?
The answer is no if we have a GCD existing. As we see the fast pointer will only cover the last nodes in every part.

Floyd's algorithm for finding a cycle in a linkedlist, how to prove that it will always work

I understand the concept of Floyd's algorithm for cycle detection. It concludes that if the Tortoise travels twice as fast as the Hare, and if the Tortoise has a head start of k meters in a loop, the Tortoise and the Hare will meet k meters before the loop.
In the case of singly linkedlist, you have pointer A travelling twice as fast as pointer B. This means that if it takes pointer B k-steps to reach the entry point of the loop(which we dont know where it is yet), pointer A will already have a head start of k nodes inside the loop. Therefore, two pointers will meet k nodes before the entry point of the loop. Thus, if we move pointer B back to the head pointer and keep pointer A at the meeting point(now both pointers are k nodes away from the entry point), and move both at the same pace, they will meet at the entry point of the loop.
How can you prove that the algorithm will work in the following boundary cases?
A linkedlist where the last node loop back to the head. In this case, what will the head start value, k, be?
A super long linkedlist, 1000 nodes, and has a small loop at the end, 3 nodes. Pointer A will have a head start of 1000, which means by the time pointer B reaches the entry point of the loop, A will already have looped many times.
What if there is a loop of 1 node?
This is not homework. I was told by an interviewer that this algorithm won't work if I have a small loop. He didn't explain why.
It is clear that both pointers will eventually reach the loop if there is one. Lets assume, the loop has length N. We can do calculations of the position in the loop modulo N.
Now say pointer A is at position a and pointer B is at position b. After s steps, A will be at a+2s mod N and B will be at b+s mod N. For the two pointers to meet we must have
a+2s = b+s (mod N)
a+s = b (mod N)
s = b - a (mod N)
So after b - a (mod N) steps the two pointers will meet.
Just consider this: after n moves you can be sure that both pointers will be in the cycle or some of them has the end. With next n moves you can be sure that A and B will meet and some point since the cycle's size is <= n and since with every step the difference between them reduces with 1.
Of course it works with a small loop. Consider a loop of two nodes. That is:
A => B => C => B
So the tortoise and hare start at A. The table below shows what happens:
Tortoise Hare
A A
C B
C C
When there are only two nodes, then the tortoise always ends where it started. So the tortoise will essentially remain still while the hare moves one node each time and eventually catch up.
The same thing happens, by the way, when you have a loop of only one node. That is, when a node loops back on itself.
Henry's answer gives the mathematical proof.

Getting the nth to last element in a linked list

We have a linked list of size L, and we want to retrieve the nth to the last element.
Solution 1: naive solution
make a first pass from the beginning to the end to compute L
make a second pass from the beginning to the expected position
Solution 2: use 2 pointers p1, p2
p1 starts iterating from the beginning, p2 does not move.
when there are n elements between p1 and p2, p2 starts iterating as well
when p1 arrives at the end of the list, p2 is at the expected position
Both solutions seem to have the same time complexity (i.e, 2L - n iterations over list elements)
Which one is better?
Both those algorithms are two-pass. The second may have better performance for reasonably small n because the second pass accesses memory that is already cached by the first pass. (The passes are interleaved.)
A one-pass solution would store the pointers in a circular buffer or queue, and return the "head" of the queue once the end of the list is reached.
How about using 3 pointers p, q, r and a counter.
Iterate through the list with p updating the counter.
Every n nodes assign r to q and q to p
When you hit the end of the list you can figure out how far
r is from the end of the list.
You can get the answer in no more than O(L + n)
If n << L, solution 2 is typically faster, because of caching, i.e. the memory blocks containing p1 and p2 are copied to the CPU cache once and the pointers moved for a bunch of iterations before RAM needs to be accessed again.
Would it not be much cheaper to simply store the length of the linked list in O(1) memory? The only reason you have to do a "first pass" at all is because you don't know the length of your linked list. If you store the length, you could iterate over (|L|-n) elements every time and get retrieve the element easily. For higher values of n in comparison to L, this way would save you substantial amounts of time. For example if n was equal to |L|, you could simply return the head of the list with no iteration at all.
This method uses slightly more memory than your first algorithm since it stores the length in memory, but your second algorithm uses two pointers, whereas this method only uses 1 pointer. If you have the memory for a second pointer, you probably have the memory to store the length of your linked list.
Granted O(|L|-n) is equivalent to O(n) in pure theory, but there are "fast" linear algorithms and then there are "slow" ones. Two-pass algorithms for this kind of problem are slow.
As #HotLicks pointed out in the comments, "One needs to understand that "big O" complexity is only loosely related to actual performance in many cases, since it ignores additive factors and constant multipliers." IMO just go for the laziest method in this case and don't overthink it.

Algorithm for finding path combinations?

Imagine you have a dancing robot in n-dimensional euclidean space starting at origin P_0 = (0,0,...,0).
The robot can make m types of dance moves D_1, D_2, ..., D_m
D_i is an n-vector of integers (D_i_1, D_i_2, ..., D_i_n)
If the robot makes dance move i than its position changes by D_i:
P_{t+1} = P_t + D_i
The robot can make any of the dance moves as many times as he wants and in any order.
Let a k-dance be defined as a sequence of k dance moves.
Clearly there are m^k possible k-dances.
We are interested to know the set of possible end positions of a k-dance, and for each end position, how many k-dances end at that location.
One way to do this is as follows:
P0 = (0, 0, ..., 0);
S[0][P0] = 1
for I in 1 to k
for J in 1 to m
for P in S[I-1]
S[I][P + D_J] += S[I][P]
Now S[k][Q] will tell you how many k-dances end at position Q
Assume that n, m, |D_i| are small (less than 5) and k is less than 40.
Is there a faster way? Can we calculate S[k][Q] "directly" somehow with some sort of linear algebra related trick? or some other approach?
You could create an adjacency matrix that would contain dance-move transitions in your space (the part of it that's reachable in k moves, otherwise it would be infinite). Then, the P_0 row of n-th power of this matrix contains the S[k] values.
The matrix in question quickly gets enormous, something like (k*(max(D_i_j)-min(D_i_j)))^n (every dimension can be halved if Q is close to origin), but that's true for your S matrix as well
Since dance moves are interchangable you can assume that for a i < j the robot first makes all the D_i moves before the D_j moves, thus reducing the number of combinations to actually calculate.
If you keep track of the number of times each dance move was made calculating the total number of combinations should be easy.
Since the 1-dimensional problem is closely related to the subset sum problem, you could probably take a similar approach - find all of the combinations of dance vectors that add together to have the correct first coordinate with exactly k moves; then take that subset of combinations and check to see which of those have the right sum for the second, and take the subset which matches both and check it for the third, and so on.
In this way, you get to at least only have to perform a very simple addition for the extremely painful O(n^k) step. It will indeed find all of the vectors which will hit a given value.

Sub O(n^2) algorithm for counting nested intervals?

We have a list of intervals of the form [ai, bi]. For each interval, we want to count the number of other intervals that are nested within it.
For example, if we had two intervals, A = [1,4] and B = [2,3]. Then the count for B would be 0 as there are no nested intervals for B; and the count for A would be 1 as B fits within A.
My question is, does there exist a sub- O(n2) algorithm for this problem where n is the number of intervals?
EDIT: Here are the conditions the intervals meet. The end points of the intervals are floating point numbers. The lower limit for the ai's/bi's is 0 and the upper limit is whatever max float is. Also, there is the condition that ai < bi, so no intervals of length 0.
Yes, it is possible.
We will borrow the typical computational geometry "scan line" trick.
First, let's answer an easier (but closely related) question. Instead of reporting how many other intervals each interval contains, let's report how many intervals each is contained in. So for your example with only two intervals, interval I0 = [1,4] has value zero because it is contained in zero intervals, while I1 = [2,3] has value one because it is contained in one interval.
You will see in a minute (a) why this question is easier and (b) how it leads to the answer for the original question.
To solve this easier question: Take all starting and ending points -- all of the ai and bi -- and put them into a master list. Call each element of this list an "event". So an event would be something like "interval I37 started" or "interval I23 ended".
Sort this list of events and process it in order.
As you process the list of events, maintain a set S of "active intervals". An interval is "active" if we have encountered its start event but not its ending event; that is, if we are within that interval.
Now, whenever we see an ending event bj, we are ready to compute how many intervals contain Ij (= [aj, bj]). All we need to do is examine the set S of active intervals and determine how many of them started before aj. That is our answer for how many intervals contain interval Ij.
To do this efficiently, keep S itself sorted by starting point; e.g., by using a self-balancing binary tree.
Sorting the list of events is O(2n log 2n) = O(n log n). Adding or removing an element from a self-balancing binary tree is O(log n). Asking "how many elements of the self-balancing binary tree are less than x?" is also O(log n). Therefore this entire algorithm is O(n log n).
So, that solves the easy question. Call that the "easy algorithm". Now for what you actually asked.
Think of the number line as extending to infinity and wrapping around to -infinity, and define an interval with bi < ai to start at ai, stretch to infinity, wrap to minus infinity, and end at bi.
For any interval Ij = [aj, bj], define Complement(Ij) as the interval [bj, aj]. (For example, the interval [2, 3] starts at 2 and ends at 3; so Complement([2,3]) = [3,2] starts at 3, stretches to infinity, wraps to -infinity, and ends at 2.)
Observe that interval I contains interval J if and only if Complement(J) contains Complement(I). (Prove this.)
So, we can answer your original question simply by running the "easy algorithm" on the set of complements of all of the intervals. That is, start your scan at -infinity with the set S of "active intervals" containing all intervals (because all complements contain infinity/-infinity). Keep S sorted by end point (i.e. start point of complement).
Sort all start points and end points and process them in order. When you encounter a starting point for interval Ij (= [aj, bj]), you are actually hitting the end point of its complement... So remove Ij from S, query S to see how many of its endpoints (i.e. complement start points) come before bj, and report that as the answer for Ij. If you later encounter the end point of Ij, you are encountering the start point of its complement, so you need to add it back into the set S of active intervals.
This final algorithm is O(n log n) for the same reasons the "easy algorithm" was.
[Update]
One clarification, one correction, one comment...
Clarification: Of course, the "self-balancing binary tree" has to be augmented such that each sub-tree knows how many elements it contains. Otherwise, you cannot answer "how many elements are less than x?" This augmentation is straightforward to maintain, but it is not something that every implementation provides; e.g. the C++ std::set does not, to my knowledge.
Correction: You do not want to add any elements back in to the set S of active intervals; in fact, doing so can result in the wrong answer. For example, if the intervals are just [1,2] and [3,4], you would hit 1 (and remove [1,2] from the set), then 2 (and add it back in again), then 3... And since 2<4, you would conclude that [3,4] contains [1,2]. Which is wrong.
Conceptually, you already processed all of the "start events" for the complement intervals; that is why S begins will all intervals inside of it. So all you need to worry about are the ending points; you do not want to add any elements to S, ever.
Put another way, instead of having the intervals wrap around, you can think of [bi,ai] (where bi > ai) as meaning [bi - infinity, ai] with no wrap-around. The logic still works, but the processing is more clear: First you process all of the "whatever - infinity" terms (i.e. the end points), then you process the others (i.e. the start points).
With this correction, I am pretty sure my solution actually works. This formulation also extends -- I think -- to the case where you have both normal and "backward" intervals together in one input.
Comment: This problem is tricky because if you have to enumerate the set of all intervals contained within every interval, the output itself can be O(n^2). So any working approach has to somehow count the intervals without even being able to identify them :-).
Here is a O(N*LOG(N)):
let Ii = Interval i = (ai, bi)
let L = list of intervals I
sort L by ai
divide L in half into L1a and L2a.
sort L1a and L2a by bi to get L1b and L2b
merge sort L1b and L2b keeping track of the count of nestings (e.g. because all intervals in L1b start before intervals in L2b, when we find an endpoint in L1b that is higher than an endpoint in l2b, we know everything between them is nested inside - think about it)..
Now you have updated the counts on how often an interval in L2 is nested inside an interval in L1.
after merging L1 and L2, we repeat the process (recursion) by dividing L1 into L11a and l12a, also dividing L2 into L21a and L21a..

Resources