Printing out nodes in a disjoint-set data structure in linear time - algorithm

I'm trying to do this exercise in Introduction to Algorithms by Cormen et al that has to do with the Disjoin Set data structure:
Suppose that we wish to add the operation PRINT-SET(x), which is given
a node x and prints all the members of x's set, in any order. Show how
we can add just a single attribute to each node in a disjoint-set
forest so that PRINT-SET(x) takes time linear in the number of members
of x's set, and the asymptotic running times of the other operations
are unchanged. Assume that we can print each member of the set in O(1)
time.
Now, I'm quite sure that the attribute needed is a tail pointer, so it can keep track of the children.
Since the disjoint set structure already has a parent attribute, find-set(x) can easily print out nodes going in one direction. But now, having a tail pointer, let's us go the other direction as well.
However, I'm not sure how I would write the algorithm to do this. If anyone could help me out in pseudocode, that would be much appreciated.

Each node should have a next pointer to the next node in the set it is in. The nodes in a set should form a circular linked list.
When a singleton set is first created, the node's next pointer points to itself.
When you merge set with node X and set with node Y (and you've already checked that those sets are different by normalizing to set representatives), you merge the circular linked lists, which you can do by simply swapping X.next and Y.next; so this is a O(1) operation.
To list all the elements in the set containing node X, traverse the circular linked list starting from X.

Related

Is there a heap or heap-like structure that works with pointers, in other words nodes not in an array?

I currently have a double-linked list of objects in descending sorted order. (The list is intrusive--pointers in the objects.) I have a very limited set of operations:
add a node with the highest possible key
remove a node with the highest possible key (doesn't matter which one)
remove a node with key 0 (doesn't matter which one)
increment key of a node with highest current key (doesn't matter which one)
decrement key of any given node whose key is above 0
Operations 1-4 will be constant time, but operation 5 is O(n), where n=number of nodes with same key value. This is because such nodes, when incremented, have to be moved past their siblings with the same key value, and placed after that range. And finding that re-insert place will be O(n).
I thought of the heap (heapsort heap, not malloc heap) as a solution where worst-case would be O(log n) (where n=number of nodes). However, based on my recollection and what Google is finding me, it seems invariably implemented in an array, as opposed to a binary tree. So:
Question: is there an implementation of a heap that uses pointers in the manner of a binary tree, as opposed to an array, that maintains O() of the typical array implementation?
One common way to do this is to use an array-based heap, but:
In the heap you store pointers to nodes;
In each node you store its index in the heap; and
Whenever you swap elements in the heap, you update the indexes in the corresponding nodes;
This preserves the complexity of all the heap operations, and costs around 1.5 pointers and 1 integer per node. (the extra .5 is because of the way growable arrays are implemented).
Alternatively, you can just link the nodes together into a tree with pointers. To support the operations you want, though, this requires 3 pointers per node (parent, left, right)
Both ways work fine, but the array implementation is simpler, faster, and uses a bit less memory.
ETA:
I should point out, though, that if you use pointers then you can use different kinds of heaps. A Fibonacci heap will let you decrement the value of a node in amortized constant time. It's kinda complicated, though, and slow in practice: https://en.wikipedia.org/wiki/Fibonacci_heap
Unfortunately the answer to the written problem isn't an answer to the headline title of the written problem.
Solution 1: amortized O(1) data structure
A solution was found with amortized O(1) implementations of all required operations.
It is simply a double-linked list of double-linked lists. The "main" double-linked list nodes are called parents, and we have at most one parent per key value. The parent nodes keep a double-linked list of child nodes with the same key value. Each child additionally points to its parent.
add a node with the highest possible value: If there is no list head or it's value is not max, add new node to head of main linked list. Otherwise, add it to tail of the head node's child list.
remove a (any) node with the highest possible value: In the case of multiple items with highest value, it doesn't matter which we remove. So, if head parent has children, remove the tail child from the child list. Otherwise, remove the parent from the main list.
remove a (any) node with value 0: Same operations.
increment value of a (any) node with the highest current value: In case of multiple nodes with same key value, we can choose any, so choose the head parent's tail child. Remove it from the child list. If incrementing its value exceeds max value then you're done. Otherwise it's a new head node. If instead there are no children, then increment the head parent in place, and if it exceeds maximum value remove it.
decrement value of any node above 0: If the node is a child, remove from child list, then either add to parent's successor's child list or as a new node after the parent. A parent with no children: if the successor in the main list still has a smaller key, you're done. Otherwise remove it and add as successor's tail child. A parent with children: same but promote the head child to take its place. This is O(n), where n=number of nodes of given size, because you must change the parent pointer for all children. However, if the odds of the node selected for decrement being the parent node of all nodes of given size are 1/n, this amortizes to O(1).
The main downside is that we logically have 7 different pointers for each node. If it's in the parent role we need previous and next parent, and head and tail child. If it's in the child role we need previous and next child, and parent. These can be unionized into two alternate substructures of 4 and 3 pointers, which saves storage, but not CPU time (excepting perhaps the need to zero out unused pointers for cleanliness). Updating them all won't be fast.
Solution 2: Sloppy is Good Enough
Another approach is simply to be sloppy. The application benefits from finding nodes with higher scores but it's not critical that they be absolutely in perfect order. So rather than an O(n) operation to move nodes potentially from one end of the chain to the other, we could accept a solution that does an O(1) albeit at times imperfect job.
This could be the current implementation of a double linked list. It can support all operations except decrement in O(1). It can handle decrement of a unique key value in O(1). Only decrement of a non-unique key value would go O(n), as we need to skip the remaining nodes with the previous key value to find the first with the same or higher key. in the worst case, we could simply cap that search at say 5 or 10 links. This too would provide a nominally O(1) solution. However, some pernicious usage patterns may slowly cause the entire list to become quite unordered.

A*, what's the best data structure for the open list?

Disclaimer: I really believe that this is not a duplicate of similar questions. I've read those, and they (mostly) recommend using a heap or a priority queue. My question is more of the "I don't understand how those would work in this case" kind.
In short:
I'm referring to the typical A* (A-star) pathfinding algorithm, as described (for example) on Wikipedia:
https://en.wikipedia.org/wiki/A*_search_algorithm
More specifically, I'm wondering about what's the best data structure (which can be a single well known data structure, or a combination of those) to use so that you never have O(n) performance on any of the operations that the algorithm requires to do on the open list.
As far as I understand (mostly from the Wikipedia article), the operations needed to be done on the open list are as follows:
The elements in this list need to be Node instances with the following properties:
position (or coordinates). For the sake of argument, let's say this is a positive integer ranging in value from 0 to 64516 (I'm limiting my A* area size to 254x254, which means that any set of coordinates can be bit-encoded on 16 bits)
F score. This is positive floating point value.
Given these, the operations are:
Add a node to the open list: if a node with the same position (coordinates) exists (but, potentially, with a different F score), replace it.
Retrieve (and remove) from the open list the node with the lowest F score
(Check if exists and) retrieve from the list a node for a given position (coordinates)
As far as I can see, the problem with using a Heap or Priority Queue for the open list are:
These data structure will use the F-score as sorting criteria
As such, adding a node to this kind of data structure is problematic: how do you check optimally that a node with a similar set of coordinates (but a different F Score) doesn't already exist. Furthermore, even if you somehow are able to do this check, if you actually find such a node, but it is not on the top of the Heap/Queue, how to you optimally remove it such that the Heap/Queue keeps its correct order
Also, checking for existence and removing a node based on its position is not optimal or even possible: if we use a Priority Queue, we have to check every node in it, and remove the corresponding one if found. For a heap, if such a removal is necessary, I imagine that all remaining elements need to be extracted and re-inserted, so that the heap still remains a heap.
The only remaining operating where such a data structure would be good is when we want to remove the node with the lowest F-score. In this case the operation would be O(Log(n)).
Also, if we make a custom data structure, such as one that uses a Hashtable (with position as key) and a Priority Queue, we would still have some operations that require suboptimal processing on either of these: In order to keep them in sync (both should have the same nodes in them), for a given operation, that operation will always be subomtimal on one of the data structures: adding or removing a node by position would be fast on the Hashtable but slow on the Priority Queue. Removing the node with the lowest F score would be fast on the Priority Queue but slow on the Hashtable.
What I've done is make a custom Hashtable for the nodes that uses their position as key, that also keeps track of the current node with the lowest F score. When adding a new node, it checks if its F score is lower than the currently stored lowest F score node, and if so, it replaces it. The problem with this data structure comes when you want to remove a node (whether by position or the lowest F scored one). In this case, in order to update the field holding the current lowest F score node, I need to iterate through all the remaining node in order to find which one has the lowest F score now.
So my question is: is there a better way to store these ?
You can combine the hash table and the heap without slow operations showing up.
Have the hash table map position to index in the heap instead of node.
Any update to the heap can sync itself (which requires the heap to know about the hash table, so this is invasive and not just a wrapper around two off-the-shelf implementations) to the hash table with as many updates (each O(1), obviously) as the number of items that move in the heap, of course only log n items can move for an insertion, remove-min or update-key. The hash table finds the node (in the heap) to update the key of for the parent-updating/G-changing step of A* so that's fast too.

Find number of leaves under each node of a tree

I have a tree which is represented in the following format:
nodes is a list of nodes in the tree in the order of their height from top. Node at height 0 is the first element of nodes. Nodes at height 1 (read from left to right) are the next elements of nodes and so on.
n_children is a list of integers such that n_children[i] = num children of nodes[i]
For example given a tree like {1: {2, 3:{4,5,2}}}, nodes=[1,2,3,4,5,2], n_children = [2,0,3,0,0,0].
Given a Tree, is it possible to generate nodes and n_children and the number of leaves corresponding to each node in nodes by traversing the tree only once?
Is such a representation unique? Or is it possible for two different trees to have the same representation?
For the first question - creating the representation given a tree:
I am assuming by "a given tree" we mean a tree that is given in the form of node-objects, each holding its value and a list of references to its children-node-objects.
I propose this algorithm:
Start at node=root.
if node.children is empty return {values_list:[[node.value]], children_list:[[0]]}
otherwise:
3.1. construct two lists. One will be called values_list and each element there shall be a list of values. The other will be called children_list and each element there shall be a list of integers. Each element in these two lists will represent a level in the sub-tree beginning with node, including node itself (will be added at step 3.3).
So values_list[1] will become the list of values of the children-nodes of node, and values_list[2] will become the list of values of the grandchildren-nodes of node. values_list[1][0] will be the value of the leftmost child-node of node. And values_list[0] will be a list with one element alone, values_list[0][0], which will be the value of node.
3.2. for each child-node of node (for which we have references through node.children):
3.2.1. start over at (2.) with the child-node set to node, and the returned results will be assigned back (when the function returns) to child_values_list and child_children_list accordingly.
3.2.2. for each index i in the lists (they are of same length) if there is a list already in values_list[i] - concatenate child_values_list[i] to values_list[i] and concatenate child_children_list[i] to children_list[i]. Otherwise assign values_list[i]=child_values_list[i] and children_list[i]=child.children.list[i] (that would be a push - adding to the end of the list).
3.3. Make node.value the sole element of a new list and add that list to the beginning of values_list. Make node.children.length the sole element of a new list and add that list to the beginning of children_list.
3.4. return values_list and children_list
when the above returns with values_list and children_list for node=root (from step (1)), all we need to do is concatenate the elements of the lists (because they are lists, each for one specific level of the tree). After concatenating the list-elements, the resulting values_list_concatenated and children_list_concatenated will be the wanted representation.
In the algorithm above we visit a node only by starting step (2) with it set as node and we do that only once for each child of a node we visit. We start at the root-node and each node has only one parent => every node is visited exactly once.
For the number of leaves associated with each node: (if I understand correctly - the number of leaves in the sub-tree a node is its root), we can add another list that will be generated and returned: leaves_list.
In the stop-case (no children to node - step (2)) we will return leaves_list:[[1]]. In step (3.2.2) we will concatenate the list-elements like the other two lists' list-elements. And in step (3.3) we will sum the first list-element leaves_list[0] and will make that sum the sole element in a new list that we will add to the beginning of leaves_list. (something like leaves_list.add_to_eginning([leaves_list[0].sum()]))
For the second question - is this representation unique:
To prove uniqueness we actually want to show that the function (let's call it rep for "representation") preserves distinctiveness over the space of trees. i.e. that it is an injection. As you can see in the wiki linked, for that it suffices to show that there exists a function (let's call it tre for "tree") that given a representation gives a tree back, and that for every tree t it holds that tre(rep(t))=t. In simple words - that we can make a method that takes a representation and builds a tree out of it, and for every tree if we make its representation and passes that representation through that methos we'll get the exact same tree back.
So let's get cracking!
Actually the first job - creating that method (the function tre) is already done by you - by the way you explained what the representation is. But let's make it explicit:
if the lists are empty return the empty tree. Otherwise continue
make the root node with values[0] as its value and n_children[0] as its number of children (without making the children nodes yet).
initiate a list-index i=1 and a level index li=1 and level-elements index lei=root.children.length and a next-level-elements accumulator nle_acc=0
while lei>0:
4.1. for lei times:
4.1.1. make a node with values[i] as value and n_children[i] as the number of children.
4.1.2. add the new node as the leftmost child in level li that has not been filled yet (traverse the tree to the li level from the leftmost in right direction and assign the new node to the first reference that is not assigned yet. We know the previous level is done, so each node in the li-1 level has a children.length property we can check and see if each has filled the number of children they should have)
4.1.3. add nle_acc+=n_children[i]
4.1.4. increment ++i
4.2. assign lei=nle_acc (level-elements can take what the accumulator gathered for it)
4.3. clear nle_acc=0 (next-level-elements accumulator needs to accumulate from the start for the next round)
Now we need to prove that an arbitrary tree that is passed through the first algorithm and then through the second algorithm (this one here) will get out of all of that the same as it was originally.
As I'm not trying to prove the corectness of the algorithms (although I should), let's assume they do what I intended them to do. i.e. the first one writes the representation as you described it, and the second one makes a tree level-by-level, left-to-right, assigning a value and the number of children from the representation and fills the children references according to those numbers when it comes to the next level.
So each node has the right amount of children according to the representation (that's how the children were filled), and that number was written from the tree (when generating the representation). And the same is true for the values and thus it is the same tree as the original.
The proof actually should be much more elaborate and detailed - but I think I'll leave it at that now. If there will be a demand for elaboration maybe I'll make it an actual proof.

Linked Lists and Sentinal node

So i have been asked in my homework to merge-sort 2 different sorted circular linked lists without useing a sentinal node allso the lists can be empty, my questsion is what is a sentinal node in the first place?
a sentinal node is a node that contains no real data - it's just there for the convenience of the implementation.
Thus a list with 4 real elements might have one or more extra nodes, making a total of 5 or 6 nodes.
Those extra nodes might be place holders (e.g. marking where you started the merge), pseudo-nodes indicating the head of the list, or anything else the algorithm designer can think up.
A sentinel node is a node that you add to your code to avoid handling degeneracies with special code. For merge sort for example, you can add a node with value = INFINITY to the end of both lists that you want to merge, this guarantees that once you hit the end of a list you can't go beyond that because the value is always greater (or equal) to the values in the other list.
So if you are not using a sentinel, you have to write code to handle this. In your merge routine, you should check that you've reached the end..
Sentinel node is a traversal path terminator in linked lists and tree. It doesn't hold or reference any data managed by the data structure. One of the benefit is to Reduce algorithmic complexity and code size.In your case the complexity,code size will be increased and speed of operation will be decreased.

Disjoint Set Operation Find_Set(x) using linked list

Its about naive Union-Find algorithm using linked-list representation of disjoint sets:
Find_Set(x) operation returns a pointer to the representative of the set containing element x.which requires O(1) time, since node containing x has a pointer directly pointing to representative of x.But before that first we need to find the particular node containing element x among all the disjoint sets.so this searching is not O(1).I don't understand how Find_set(x) is O(1)(As given in books), when we don't know in which disjoint set the node containing x belongs.
Each element is assumed to contain some pointer/reference to the set it belongs to (the set can actually be represented by one of its member element). So when querying Find_Set(x), since you already have the element x, you simply have to consult this pointer/reference and the operation is O(1). With a linked-list implementation, where each set is stored as a linked list of elements, each element holds a pointer to the head of the linked list which is chosen as representative element of the set.

Resources