Follow up on detecting loop in linked list - algorithm

So there are several questions on how to detect a loop in a linked list. Here is one example. My question is, why do all these algorithms use two pointers? Couldn't you just loop through with one pointer and mark the nodes as visited, and when you come to a node you've already visited or reach the end of the linked list (next = null), then you know there is no loop?

It's because to
mark the nodes as visited
you need either extra space in the nodes themselves to do it, or an auxiliary data structure whose size will increase with that of the list, whereas the two-pointer solutions require only enough extra space for one more pointer.
[EDITED to add:] ... And, perhaps, also because the two-pointer solutions are clever and people like clever solutions to things.

Related

What are some true Iterative Depth First Search implementation suggestions?

So from what I am seeing I have been thaught like most people that the iterative version of DFS is just like iterative BFS besides two differences: replace queue with stack and mark a node as discovered after POP not after PUSH.
Two questions have really been puzzling me recently:
For certain cases it will result in different output, but is it really necessary to mark a node as visited after we POP? Why is it wrong to do it after we PUSH? As far as I can see this will result in the same consequence as to why we do it for BFS...to not have duplicates in our queue/stack.
Now the big one: My impression is that this kind of implementation of iterative DFS is not true DFS at all: If we think about the recursive version it is quite space efficient since it doesnt store all the possible neighbours at one level(as we would do in the iterative version), it only selects one and goes with it and then it backtracks and goes for a second one. As an extreme example think of a graph with one node in the center and conected to 100 leaf nodes. In the recursive implementation if we start from the middle node the underlying stack will grow to maximum 2 ... one for the middle node and one for every leaf it visits. If we do it as we have been thaught with the iterative version the stack will grow to 1000 elements. That doesnt seem right.
So with all those previous details in mind, my question is what would the approach be to have a true iterative DFS implementation?
Question 1: If you check+mark before the push, it uses less space but changes the order in which nodes are visited. You will visit everything, though.
Question 2: You are correct that an iterative DFS will usually put all the children of a node onto the stack at the same time. This increases the space used for some graphs, but it doesn't change the worst case space usage, and it's the easiest way so there's usually no reason to change that.
Occasionally you know that it will save a lot of space if you don't do this, and then you can write an iterative DFS that works more like the recursive one. Instead of pushing the next nodes to visit on the stack, you push a parent and a position in its list of children, or equivalent, which is pretty much what the recursive version has to remember when it recurses. In pseudo-code, it looks like this:
func DFS(start):
let visited = EMPTY_SET
let stack = EMPTY_STACK
visited.add(start)
visit(start)
stack.push( (start,0) )
while(!stack.isEmpty()):
let (parent, pos) = stack.pop()
if (pos < parent.numChildren()):
let child = parent.child[pos]
stack.push(parent,pos+1)
if (!visited.contains(child)):
visited.add(child)
visit(child)
stack.push( (child,0) )
You can see that it's a little more complicated, and the records you push on the stack are tuples, which is annoying in some cases. Often we'll use two stacks in parallel instead of creating tuples to push, or we'll push/pop two records at a time, depending on how nodes and child list positions have to be represented.

How to implement linked list with 1 million nodes?

I recently attended Microsoft Interview.
I was asked to implement linked list with 1 million nodes? How will you access 999999th node?
What is the optimal design strategy and implementation for such a question?
A linked list has fairly few variations, because much variation means it would be something other than a linked list.
You can vary it by having single or double linking. Single linking is where you have a pointer to the head (the first node, A say) which points to B which points to C, etc. To turn that into a double linked list you would also add a link from C to B and B to A.
If you have a double linked list then it's meaningful to retain a pointer to the list tail (the last node) as well as the head, which means accessing the last element is cheap, and elements near the end are cheaper, because you can work backwards or forwards... BUT... you would need to know what you want is at the end of the list... AND at the end of the day a linked list is still just that, and if it is going to get very large and that is a problem because of the nature of its use case, then a storage structure other than a linked list should probably be chosen.
You could hybridise your linked list of course, so you could index it or something for example, and there's nothing wrong with that in theory, but if you index ALL the nodes then the linked list nature is no longer of much value, and if you index only some, then the nodes in between indexed nodes have to be sorted or something so you can find a close node and work towards a target node... probably this would never be optimal and a better data structure should be chosen.
Really a linked list should be used when you don't want to do things like get a specific node, but want to iterate nodes regardless.
I have no idea about what I'm going to say, but, here goes:
You could conceptually split the list in sqrt(1000000) blocks, in such a way that you would have "reference pointers" every 1000 elements.
Think of it as having 1000 linked lists each with 1000 elements representing your list with 1000000 elements.
This is what comes to mind!
As Michael said you should present first the two classic variations of linked list. The next thing you should do is ask about insertion, search and deletion patterns.
These patterns will guide you towards a better fit data structure, because nobody wants a simple or double linked list with a million nodes.
A doubly circular Linked List with a static counter to point the index could be quite helpful in this case.
What I am suggesting is creating a Circular doubly Linked List having a counter variable which keeps track of the index of each node and a static variable which will hold overall number of nodes in the list.
Now when you have a search item for the index which is greater than 50% of the total nodes count i.e. searching elements at lower half you can start traversing the list from reverse direction.
Let say you have 10 nodes in your circular linked list and you want to search 8th node so you can quickly start traversing the list in opposite direction 2 times.
This approach reduces the iterations to search list item indexed at extremes but still in worst case you have to traverse half way through list for items in middle.
The only downfall in this approach is memory constraints which I am assuming is not an design concern here.

BTree- predetermined size?

I read this on wikipedia:
In B-trees, internal (non-leaf) nodes can have a variable number of
child nodes within some pre-defined range. When data is inserted or
removed from a node, its number of child nodes changes. In order to
maintain the pre-defined range, internal nodes may be joined or split.
Because a range of child nodes is permitted, B-trees do not need
re-balancing as frequently as other self-balancing search trees, but
may waste some space, since nodes are not entirely full.
We have to specify this range for B trees. Even when I looked up CLRS (Intro to Algorithms), it seemed to make to use of arrays for keys and children. My question is- is there any way to reduce this wastage in space by defining the keys and children as lists instead of predetermined arrays? Is this too much of a hassle?
Also, for the life of me I'm not able to get a decent psedocode on btreeDeleteNode. Any help here is appreciated too.
When you say "lists", do you mean linked lists?
An array of some kind of element takes up one element's worth of memory per slot, whether that slot is filled or not. A linked list only takes up memory for elements it actually contains, but for each one, it takes up one element's worth of memory, plus the size of one pointer (two if it's a doubly-linked list, unless you can use the xor trick to overlap them).
If you are storing pointers, and using a singly-linked list, then each list link is twice the size of each array slot. That means that unless the list is less than half full, a linked list will use more memory, not less.
If you're using a language whose runtime has per-object overhead (like Java, and like C unless you are handling memory allocation yourself), then you will also have to pay for that overhead on each list link, but only once on an array, and the ratio is even worse.
I would suggest that your balancing algorithm should keep tree nodes at least half full. If you split a node when it is full, you will create two half-full nodes. You then need to merge adjacent nodes when they are less than half full. You can then use an array, safe in the knowledge that it is more efficient than a linked list.
No idea about the details of deletion, sorry!
B-Tree node has an important characteristic, all keys in the node is sorted. When finding a specific key, binary search is used to find the right position. Using binary search keeps the complexity of search algorithm in B-Tree O(logn).
If you replace the preallocated array with some kind of linked list, you lost the ordering. Unless you use some complex data structures, like skip list, to keep the search algorithm with O(logn). But it's totally unnecessary, skip list itself is better.

Linked list loop detection algorithm

I read some interview question online about how would you find if there's a loop in a linked list, and solution (Floyd's cycle-finding algorithm) is to have two pointers, one is 2x faster than the other, and check if they meet again.
My question is: Why can't I just keep one pointer fixed, just move the other pointer forward by 1 step each time?
Because maybe not the complete linkedList is within the loop.
For a linked list lasso detection algorithm, you need two pointers:
As long as the first pointer is where the cowboy is, no loop is detected. But if you move it stepwise forward, it will eventually enter the loop.
BTW, lasso is the terminus technicus from graph theory, cowboy isn't.
Because the first (non-moving) pointer might not lie within the loop, so the pointers would never meet. (Remember that a loop may consist of only part of the list.)
Because the loop may not contain the element pointed to by the first pointer.
For example, if the first pointer points to element 1 and the linked list has a loop later on (1->2->3->4->2), your algorithm won't detect it.

Data Structure Creation (PQ linked list merge?)

So I need to find a data structure for this situation that I'll describe:
This is not my problem but explains the data structure aspect i need more succinctly:
I have an army made up of platoons. Every platoon has a number of men and a rank number(highest being better). If an enemy were to attack my army, they would kill some POWER of my army, starting from the weakest platoon and working up, where it takes (platoon rank) amount of power to kill every soldier from a platoon.
I could easily simulate enemies attacking me by peeking and popping elements from my priority queue of platoons, ordered by rank number, but that is not what I need to do. What i need is to be able to allow enemies to view all the soldiers they would kill if they attacked me, without actually attacking, so without actually deleting elements from my priorityqueue(if i implemented it as a pq).
Sidenote: Java's PriorityQueue.Iterator() prints elements in a random order, I know an iterator is all I need, just fyi.
The problem is, if I implemented this as a pq, I can only see the top element, so I would have to pop platoons off as if they were dying and then push them back on when the thought of the attack has been calculated. I could also implement this as a linked list or array, but insertion takes too long. Ultimately I would love to use a priority queue I just need the ability to view either the (pick an index)'th element from the pq, or to have every object in the pq have a pointer to the next object in the pq, like a linked list.
Is this thought about maintaining pointers with a pq like a linked list possible within java's PriorityQueue? Is it implemented for me somewhere in PriorityQueue that I dont know about? is the index thing implemented? is there another data structure I can use that can better serve my purpose? Is it realistic for me to find the source code from Java's PriorityQueue and rewrite it on my machine to maintain these pointers like a linked list?
Any ideas are very welcome, not really sure which path I want to take on this one.
One thing you could do is an augmented binary search tree. That would allow efficient access to the nth smallest element while still keeping the elements ordered. You could also use a threaded binary search tree. That would allow you to step from one element to the next larger one in constant time, which is faster than in a normal binary tree. Both of these data structures are slower than a heap, though.

Resources