Floyd loop detection algorithm with different step size - algorithm

In the Floyd loop detection algorithm in linked list,we generally increment slow pointer by 1 unit and fast pointer by 2 unit. What are the other values that we can use for incrementing the slow and fast pointer and how do they change the complexity of algorithm ?

The two pointers will always meet, regardless of speeds or loop size.
Using the following values:
a and b: The number of steps taken by each pointer for each iteration.
m: The number of nodes in the loop.
After i iterations, the two pointers will have taken ai and bi steps. They will be at the same node if i is large enough that both pointers are inside the loop, and:
ai = bi (mod m)
which is the same as:
(a-b)i = 0 (mod m)
This will be true for a value of i which is a multiple of m, and is large enough. Such a value will always exist so the pointers will always meet.
Larger values of a and b will increase the number of steps taken per iteration, but if they are both constants then the complexity will still be linear.

I think the step size does not matter. As long as slow < fast the two would meet if there is a cycle in the list.
The only difference would be that in each iteration the number of steps taken by each pointer would vary.

well i understood it in an argumentative way with use of some basics maths. imagine a linked list with a loop,both the slow pointer and the fast pointer starts moving.
Let T be the point where the loop starts or the node where the list connects itself.
when the slow pointer reaches this node the fast pointer would now be inside the loop. so hence now imagine this loop like clock having an hour hand and a minute hand , the two pointers will meet irrespective of their speed on the common multiples of their speeds.

Related

Can the efficiency of an algorithm be modelled as a function between input size and time?

Consider the following algorithm (just as an example as the implementation is obviously inefficient):
def add(n):
for i in range(n):
n += 1
return n
The program adds one number with itself and returns it. Now the efficiency of an algorithm is sometimes modelled as a function between the size of the input and the number of primitive steps the algorithm has to compute. In this case, the input is an integer, n, and as n gets increased the number of steps necessary to complete the algorithm also increase (in this case linearly). But is it true that the size of the input increases? Let's assume that the machine where the program is running is representing integers in 8 bits. So if I increase the hypthetical input 3 for example to 7, the number of bits involved remains the same: 00000011 -> 00000111. However, the steps necessary to compute the algorithm increase. So it seems like that it's not always true that algorithmic efficiency can be modelled as a relation between input size and steps to compute. Could somebody explain to me where I go wrong or if I don't go wrong, why it still makes sense to model the efficiency of an algorithm as a function between the size of the input and the number of primitive steps to be computed?
Let S be the size of the input n. (Normally we'd use n for this size, but since the argument is also called n, that's confusing). For positive n, there's a relation between S and n, namely S = ceil(ln(n)). The program loops n times, and since n < 2^S, it loops at most 2^S times. You can also show it loops at least 1/2 * 2^S times, so the runtime (measured in loop iterations) is Theta(2^S).
This shows there's a way to model the runtime as a function of the size, even if it's not exact.
Whether it makes sense. In your example it doesn't much, but if your input is an array for sorting, taking size as the number of elements in the array does makes sense. (And it's typically what's used for example to model the number of comparisons done by different sort algorithms).

Why does Floyd's cycle finding algorithm fail for certain pointer increment speeds?

Consider the following linked list:
1->2->3->4->5->6->7->8->9->4->...->9->4.....
The above list has a loop as follows:
[4->5->6->7->8->9->4]
Drawing the linked list on a whiteboard, I tried manually solving it for different pointer steps, to see how the pointers move around -
(slow_pointer_increment, fast_pointer_increment)
So, the pointers for different cases are as follows:
(1,2), (2,3), (1,3)
The first two pairs of increments - (1,2) and (2,3) worked fine, but when I use the pair (1,3), the algorithm does not seem to work on this pair. Is there a rule as to by how much we need to increment the steps for this algorithm to hold true?
Although I searched for various increment steps for the slower and the faster pointer, I haven't so far found a single relevant answer as to why it is not working for the increment (1,3) on this list.
The algorithm can easily be shown to be guaranteed to find a cycle starting from any position if the difference between the pointer increments and the cycle length are coprimes (i.e. their greatest common divisor must be 1).
For the general case, this means the difference between the increments must be 1 (because that's the only positive integer that's coprime to all other positive integers).
For any given pointer increments, if the values aren't coprimes, it may still be guaranteed to find a cycle, but one would need to come up with a different way to prove that it will find a cycle.
For the example in the question, with pointer increments of (1,3), the difference is 3-1=2, and the cycle length is 6. 2 and 6 are not coprimes, thus it's not known whether the algorithm is guaranteed to find the cycle in general. It does seem like this might actually be guaranteed to find the cycle (including for the example in the question), even though it doesn't reach every position (which applies with coprime, as explained below), but I don't have a proof for this at the moment.
The key to understanding this is that, at least for the purposes of checking whether the pointers ever meet, the slow and fast pointers' positions within the cycle only matters relative to each other. That is, these two can be considered equivalent: (the difference is 1 for both)
slow fast slow fast
↓ ↓ ↓ ↓
0→1→2→3→4→5→0 0→1→2→3→4→5→0
So we can think of this in terms of the position of slow remaining constant and fast moving at an increment of fastIncrement-slowIncrement, at which point the problem becomes:
Starting at any position, can we reach a specific position moving at some speed (mod cycle length)?
Or, more generally:
Can we reach every position moving at some speed (mod cycle length)?
Which will only be true if the speed and cycle length are coprimes.
For example, look at a speed of 4 and a cycle of length 6 - starting at 0, we visit:
0, 4, 8%6=2, 6%6=0, 4, 2, 0, ... - GCD(4,6) = 2, and we can only visit every second element.
To see this in action, consider increments of (1,5) (difference = 4) for the example given above and see that the pointers will never meet.
I should note that, to my knowledge at least, the (1,2) increment is considered a fundamental part of the algorithm.
Using different increments (as per the above constraints) might work, but it would be a move away from the "official" algorithm and would involve more work (since a pointer to a linked-list must be incremented iteratively, you can't increment it by more than 1 in a single step) without any clear advantage for the general case.
Bernhard Barker explanation is spot on.
I am simply adding on to it.
Why should the difference of speeds between the pointers and the cycle length be
coprime numbers?
Take a scenario where the difference of speeds between pointers(say v) and cycle length(say L) are not coprime.
So there exists a GCD(v,L) greater than 1 (say G).
Therefore, we have
v=difference of speeds between pointers
L=Length of the cycle(i.e. the number of nodes in the cycle)
G=GCD(v,L)
Since we are considering only relative positions, essentially the slow is stationary and the fast is moving at a relative speed v.
Let fast be at some node in the cycle.
Since G is a divisor of L we can divide the cycle into G/L parts. Start dividing from where fast is located.
Now, v is a multiple of G (say v=nG).
Every time the fast pointer moves it will jump across n parts. So in each part the pointer arrives on a single node(basically the last node of a part). Each and every time the fast pointer will land on the ending node of every part. Refer the image below
Example image
As mentioned above by Bernhard, the question we need to answer is
Can we reach every position moving at some speed?
The answer is no if we have a GCD existing. As we see the fast pointer will only cover the last nodes in every part.

Getting the nth to last element in a linked list

We have a linked list of size L, and we want to retrieve the nth to the last element.
Solution 1: naive solution
make a first pass from the beginning to the end to compute L
make a second pass from the beginning to the expected position
Solution 2: use 2 pointers p1, p2
p1 starts iterating from the beginning, p2 does not move.
when there are n elements between p1 and p2, p2 starts iterating as well
when p1 arrives at the end of the list, p2 is at the expected position
Both solutions seem to have the same time complexity (i.e, 2L - n iterations over list elements)
Which one is better?
Both those algorithms are two-pass. The second may have better performance for reasonably small n because the second pass accesses memory that is already cached by the first pass. (The passes are interleaved.)
A one-pass solution would store the pointers in a circular buffer or queue, and return the "head" of the queue once the end of the list is reached.
How about using 3 pointers p, q, r and a counter.
Iterate through the list with p updating the counter.
Every n nodes assign r to q and q to p
When you hit the end of the list you can figure out how far
r is from the end of the list.
You can get the answer in no more than O(L + n)
If n << L, solution 2 is typically faster, because of caching, i.e. the memory blocks containing p1 and p2 are copied to the CPU cache once and the pointers moved for a bunch of iterations before RAM needs to be accessed again.
Would it not be much cheaper to simply store the length of the linked list in O(1) memory? The only reason you have to do a "first pass" at all is because you don't know the length of your linked list. If you store the length, you could iterate over (|L|-n) elements every time and get retrieve the element easily. For higher values of n in comparison to L, this way would save you substantial amounts of time. For example if n was equal to |L|, you could simply return the head of the list with no iteration at all.
This method uses slightly more memory than your first algorithm since it stores the length in memory, but your second algorithm uses two pointers, whereas this method only uses 1 pointer. If you have the memory for a second pointer, you probably have the memory to store the length of your linked list.
Granted O(|L|-n) is equivalent to O(n) in pure theory, but there are "fast" linear algorithms and then there are "slow" ones. Two-pass algorithms for this kind of problem are slow.
As #HotLicks pointed out in the comments, "One needs to understand that "big O" complexity is only loosely related to actual performance in many cases, since it ignores additive factors and constant multipliers." IMO just go for the laziest method in this case and don't overthink it.

Cycle detection in a linked list : Exhaustive theory

This is NOT the problem about detecting cycle in a linked list using the famous Hare and Tortoise method.
In the Hare & Tortoise method we have pointers running in 1x and 2x speeds to determine that they meet and I am convinced that its the most efficient way and the order of that type of search is O(n).
The problem is I have to come up with a proof (proving or disproving) that it is possible that two pointers will always meet when the moving speed is Ax (A times x) and Bx (B times x) and A is not equal to B. Where A an B are two random integers operating on a linked list with a cycle present.
This was asked in one of interviews I recently attended and I was not able to prove it comprehensively to myself that whether the above is possible. Any help appreciated.
Suppose there is a loop, say of length L.
Easy case first
To make it easier, first consider the case where the two particles entire loop at the same time. These particles are at the same position whenever n*A = n*B (mod L) for some positive integer n, which is the number of steps until they meet again. Taking n=L gives one solution (though there may be a smaller solution). So after L units of time, particle A has made A trips around the loop to be back at the beginning and particle B has made B trips around the loop to be back at the beginning, where they happily collide.
General Case
Now what happens when they do not enter the loop at the same time? Let A be the slower particle, i.e. A<B, and suppose A enters the loop at time m, and let's call the position at which A enters the loop 0 (since they're in the loop, they can never leave it, so I'm just renaming positions by subtracting A*m, the distance A has traveled after m time units). Then, at that time, B is already at position m*(B-A) (it's real position after m time units is B*m and it's renamed position is therefore B*m-A*m). Then we need to show that there is a time n such that n*A = n*B+m*(B-A) (mod L). That is, we need a solution to the modular equation
(n+m) * (A-B) = 0 (mod L)
Taking n = k*L-m for k large enough that k*L>m does the trick, though again, there may be a smaller solution.
Therefore, yes, they always meet.
If your two step-sizes have a common factor x: let's say the step sizes are Ax and Bx, then just consider the sequence you get from taking the original sequence and taking every x'th element. This new sequence has a cycle if and only if the original sequence does, and taking steps of size A and B on it is equivalent to taking steps of size Ax and Bx on the original sequence.
This reduction means that it's sufficient to prove that the algorithm works when A and B are coprime.
The hypothesis is false. For instance, if both pointers make leaps of an even size, the loop is also of even size, and distance between the pointers is odd, they will never meet.
UPD this is apparently an impossible situation. Because the two pointers start at the same point, the distance between them will always be even.

Finding the repeated element

In an array with integers between 1 and 1,000,000 or say some very larger value ,if a single value is occurring twice twice. How do you determine which one?
I think we can use a bitmap to mark the elements , and then traverse allover again to find out the repeated element . But , i think it is a process with high complexity.Is there any better way ?
This sounds like homework or an interview question ... so rather than giving away the answer, here's a hint.
What calculations can you do on a range of integers whose answer you can determine ahead of time?
Once you realize the answer to this, you should be able to figure it out .... if you still can't figure it out ... (and it's not homework) I'll post the solution :)
EDIT: Ok. So here's the elegant solution ... if the list contains ALL of the integers within the range.
We know that all of the values between 1 and N must exist in the list. Using Guass' formula we can quickly compute the expected value of a range of integers:
Sum(1..N) = 1/2 * (1 + N) * Count(1..N).
Since we know the expected sum, all we have to do is loop through all the values and sum their values. The different between this sum and the expected sum is the duplicate value.
EDIT: As other's have commented, the question doesn't state that the range contains all of the integers ... in this case, you have to decide whether you want to optimize for memory or time.
If you want to perform the operation using O(1) storage, you can perform an in-place sort of the list. As you're sorting you have to check adjacent elements. Once you see a duplicate, you know you can stop. Optimal sorting is an O(n log n) operation on average - which establishes an upper bound for find the duplicate in this manner.
If you want to optimize for speed, you can use an additional O(n) storage. Using a HashSet (or similar structure), insert values from your list until you determine you are inserting a duplicate into the HashSet. Inserting n items into a HashSet is an O(n) operation on average, which establishes that as an upper bound for this method.
you may try to use bits as hashmap:
1 at position k means that number k occured before
0 at position k means that number k did not occured before
pseudocode:
0. assume that your array is A
1. initialize bitarray(there is nice class in c# for this) of 1000000 length filled with zeros
2. for each num in A:
if bitarray[num]
return num
else
bitarray[num] = 1
end
The time complexity of the bitmap solution is O(n) and it doesn't seem like you could do better than that. However it will take up a lot of memory for a generic list of numbers. Sorting the numbers is an obvious way to detect duplicates and doesn't require extra space if you don't mind the current order changing.
Assuming the array is of length n < N (i.e. not ALL integers are present -- in this case LBushkin's trick is the answer to this homework problem), there is no way to solve this problem using less than O(n) memory using an algorithm that just takes a single pass through the array. This is by reduction to the set disjointness problem.
Suppose I made the problem easier, and I promised you that the duplicate elements were in the array such that the first one was in the first n/2 elements, and the second one was in the last n/2 elements. Now we can think of playing a game in which two people each hold a string of n/2 elements, and want to know how many messages they have to send to be sure that none of their elements are the same. Since the first player could simulate the run of any algorithm that takes a pass through the array, and send the contents of its memory to the second player, a lower bound on the number of messages they need to send implies a lower bound on the memory requirements of any algorithm.
But its easy to see in this simple game that they need to send n/2 messages to be sure that they don't hold any of the same elements, which yields the lower bound.
Edit: This generalizes to show that for algorithms that make k passes through the array and use memory m, that m*k = Omega(n). And it is easy to see that you can in fact trade off memory for time in this way.
Of course, if you are willing to use algorithms that don't simply take passes through the array, you can do better as suggested already: sort the array, then take 1 pass through. This takes time O(nlogn) and space O(1). But note curiously that this proves that any sorting algorithm that just makes passes through the array must take time Omega(n^2)! Sorting algorithms that break the n^2 bound must make random accesses.

Resources