set::insert and iterator validity

set::insert and iterator validity - data-structures

set::insert results in no changes to iterator validity [cplusplus.com].
The common implementation of std::set is red-black tree. Why are there no changes to iterator validity with reference to RB-tree insertion?
The way I understand RB-tree insertion is to first convert it to a 2,4-tree, do the insert and then convert back. However, from a previous question,
With the B-tree based implementation, due to node splits and consolidations, the erase member functions on these new structures may invalidate iterators to other elements in the tree

Why are there no changes to iterator validity with reference to RB-tree insertion?
Because I think set iterators are implemented as just a pointer to a Node. As you say sets are commonly implemented as a RB-tree.
Now while the parent, left and right pointers and the colour may change, the content of the node is the same. For std::set, std::map, an iterator to the node remains valid until the node is deleted.
RB-tree insertion
RB-tree insertion can be done in-situ, without any conversion.

Related

Data structure / algorithm to find current position of an identifier in array following many insertions

Summary
I have a problem that seems as if it is likely to have many good known solutions with different performance characteristics. Assuming this is the case, a great answer could include code (or pseudo code), but ideally, it would reference the literature and give them name of this problem so I can explore the problem in greater detail and understand the space of solutions.
The problem
I have an initially empty array that can hold identifiers. Any identifiers in the array are unique (they're only ever in the array once).
var identifiers: [Identifier] = []
Over time a process will insert identifiers in to the array. It won't just append them to the end. They can be inserted anywhere in the array.
Identifiers will never be removed from the array – only inserted.
Insertion needs to be quick, so probably the structure won't literally be an array, but something supporting better than linear time insertion (such as a BTree).
After some identifiers have been added to identifiers, I want to be able to query the current position of any given identifier. So I need a function to look this up.
A linear time solution to this is simply to scan through identifiers from the start until an identifier is found, and the answer is the index that was reached.
func find(identifier: Identifier) -> Int? {
for index in identifiers.indices {
if identifiers[index] == identifier {
return index
}
}
return nil
}
But this linear scaling with the size of the array is problematic if the array is very large (perhaps 100s of millions of elements).
A hash map doesn't work
We can't put the positions of the identifiers in to a hash map. This doesn't work because identifier positions are not fixed after insertion in to the array. If other identifiers are inserted before them, they will drift to higher indexes.
However, a possible acceleration for the linear time algorithm would be to cache the initial insertions position of an identifier and begin a linear scan from there. Because identifiers are only inserted, it must be at that index, or an index after it (or not in identifiers at all). Once the identifier is found, the cache can be updated.
Another option could be to update the hash-map after any insertions to correct any positions. However this would slow down insert so that it is potentially a linear time operation (as previously mentioned, identifiers is probably not a literally array but some other structure allowing better than linear time insertion).
Summary
There's a linear time solution, and there's an optimisation using a hash map (at the cost of roughly doubling storage). Is there a much better solution for looking up the current index of an identifier, perhaps in log time?

You can use an order-statistic tree for this, based on a red-black tree or other self-balancing binary tree. Essentially, each node will have an extra integer field, storing the number of descendants it currently has. (Insertion and deletion operations, and their resultant rotations, will only result in updating O(log n) nodes so this can be done efficiently). To query the position of an identifier, you examine the descendant count of its left subtree and the descendant counts of the siblings of each of its right-side ancestors.
Note that while the classic use of an order-statistic tree is for a sorted list, nothing in the logic requires this; "node A is before node B" is all you need for insertion, tree rotations don't require re-evaluating the ordering, and having a pointer to the node (and having nodes store parent pointers) is all you need to query the index. All operations are in O(log n).

Binary Search-accessing the middle element drawback

I am studying from my course book on Data Structures by Seymour Lipschutz and I have come across a point I don’t fully understand..
Binary Search Algorithm assumes that one has direct access to middle element in the list. This means that the list must be stored in some typeof linear array.
I read this and also recognised that in Python you can have access to the middle element at all times. Then the book goes onto say:
Unfortunately, inserting an element in an array requires elements to be moved down the list, and deleting an element from an array requires element to be moved up the list.
How is this a Drawback ?
Won’t we still be able to access the middle element by dividing the length of array by 2?

In the case where the array will not be modified, the cost of insertion and deletion are not relevant.
However, if an array is to be used to maintain a sorted set of non-fixed items, then insertion and deletion costs are relevant. In this case, binary search can be used to find items (possibly for deletion) and/or find where new items should be inserted. The drawback is that insertion and deletion require movement of other elements.
Python's bisect module provides binary search functionality that can be used for locating insertion points for maintaining sorted order. The drawback mentioned applies.
In some cases, a binary search tree may be a preferable alternative to a sorted array for maintaining a sorted set of non-fixed items.

It seems that author compares array-like structures and linked list
The first (array, Python and Java list, C++ vector) allows fast and simple access to any element by index, but appending, inserting or deletion might cause memory redistribution.
For the second we cannot address i-th element directly, we need to traverse list from the beginning, but when we have element - we can insert or delete quickly.

Binomial Heaps structure doesn't let you to have pointer to nodes (or iterators)

I've been playing with Binomial Heaps, and I encountered a problem I'd like to discuss here, because I think it may concern some user implementing a Binomial Heap data structure.
My aim is to have pointers to nodes, or handles, which directly points to Binomial Heap internal nodes, which in turn contain my priority value, which is usually an integer, when I insert them into a Binomial Heap.
In this way I keep the pointer/handle to what I have inserted, and if it's that the case, I can delete the value directly using binomial_heap_delete_node(node), just like an iterator works.
As we'll see, having this is NOT possible with Binomial Heaps, and that's because of the architecture of this data structure.
The main problem with Binomial Heaps, is that at some point you'll need an operation: binomial_heap_swap_parent_and_child(parent, child), and you'll need this operation in both binomial_heap_decrease_key(node, key) and in binomial_heap_delete_node(node). The purpose of these operations are quite clear, by reading their names.
So, the problem is how binomial_heap_swap_parent_and_child(parent, child) works: in all the implementations I saw, it swaps the priority value between nodes, and NOT the nodes themselves.
This will invalidate all of your pointers/handles/iterators to nodes, because they will still point to correct nodes, but those nodes won't have the same priority value you inserted before, but another one.
And this is quite logical, if we watch how Binomial Heaps (or Binomial Trees, in general) are structured: you have a parent node, treated by many children as "the parent", so many children points to it, but that parent node doesn't know how many children (or, more importantly, which children) are pointing to it, so it would be impossible to swap position of a node like this. Your only choice is to swap integer priority keys, but that will invalidate all pointers/handles/iterators to nodes.
NOTE: A possible solution would NOT be that instead of using binomial_heap_delete_node(node), one can just set priority of the node to remove to -999999999 (or such minimum values) and pop the minimum node out: this won't solve the problem, since binomial_heap_decrease_key(node, key) still needs the node parent-child swap operation, which the only solution is to swap integer priorities.
I want to know if someone has incurred in this problem before.
I think the only solution is to use another heap structure, such as Binary Heap, Pairing Heap, or something else.

As with many data structure problems, it's not hard to solve this one with an extra level of indirection. Define something like:
struct handle {
struct heap_node *node; // points to node that "owns" this handle
struct user_data *data; // priority key calculated from this
}
You provide users with handles (either by copy or pointer/reference; your choice).
Each internal node points to exactly one handle (i.e. node->handle->node == node). Two such pointers are swapped to exchange parent and child.
Several variations are possible. E.g. the data field could be the data itself rather than a pointer to it. The main idea is that adding the level of indirection between handles and nodes provides the necessary flexibility.

Trees and priority queues

I'm trying to solve an exercise which results to be a little bit difficult since I have to implement a priority queue starting from a template class of a tree (kind of RedBlack or BinarySearch Tree).
Using the template which looks like
class Node
int key
Node left
Node right
Node parent
int leftNodes
int rightNodes
Initially, when I had to insert a new element I tried to fill completely a level of the tree and then using an InOrderTreeTRaversal/Sort algorithm filling an array and generating a BinarySearch tree from that array and replacing with the new root element the original one. Supposing to have as a result a balanced tree.
Unfortunately this approach appears inappropriate since the tree must emulate the maxheap property maintaining balanced the tree for every insertion/deletion (and my code didn't work well in filling completely a tree level). It is possible implements a Tree with Heap capabilities? I mean a tree for which each element is bigger or equal its children, remains balanced after insertion and has autobalancing capabilities when the root node (the bigger key element) is deleted?

You probably want to implement a binary heap, see http://en.wikipedia.org/wiki/Binary_heap
IIRC one of the main advantages of this data structure is that it can be embedded in an array (because of the balanced nature). Heapsort uses this kind of data structure to sort in-place.

Finding proper data structure c++

I was looking for some simple implemented data structure which gets my needs fulfilled in least possible time (in worst possible case) :-
(1)To pop nth element (I have to keep relative order of elements intact)
(2)To access nth element .
I couldn't use array because it can't pop and i dont want to have a gap after deleting ith element . I tried to remove the gap , by exchanging nth element with next again with next untill last but that proves time ineffecient though array's O(1) is unbeatable .
I tried using vector and used 'erase' for popup and '.at()' for access , but even this is not cheap for time effeciency though its better than array .

What you can try is skip list - it support the operation you are requesting in O(log(n)). Another option would be tiered vector that is just slightly easier to implement and takes O(sqrt(n)). both structures are quite cool but alas not very popular.

Well , tiered vector implemented on array would i think best fit your purpose . Though the tiered vector concept may be knew and little tricky to understand at first but then once you get it , it opens lot of question and you get a handy weapon to tackle many question's data structure part very effeciently . So it is recommended that you master tiered vectors implementation.

An array will give you O(1) lookup but O(n) delete of the element.
A list will give you O(n) lookup bug O(1) delete of the element.
A binary search tree will give you O(log n) lookup with O(1) delete of the element. But it doesn't preserve the relative order.
A binary search tree used in conjunction with the list will give you the best of both worlds. Insert a node into both the list (to preserve order) and the tree (fast lookup). Delete will be O(1).
struct node {
node* list_next;
node* list_prev;
node* tree_right;
node* tree_left;
// node data;
};
Note that if the nodes are inserted into the tree using the index as the sort value, you will end up with another linked list pretending to be a tree. The tree can be balanced however in O(n) time once it is built which you would only have to incur once.
Update
Thinking about this more this might not be the best approach for you. I'm used to doing lookups on the data itself not its relative position in a set. This is a data centric approach. Using the index as the sort value will break as soon as you remove a node since the "higher" indices will need to change.

Warning: Don't take this answer seriously.
In theory, you can do both in O(1). Assuming this are the only operations you want to optimize for. The following solution will need lots of space (and it will leak space), and it will take long to create the data structure:
Use an array. In every entry of the array, point to another array which is the same, but with that entry removed.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio