Lock Free Deque that supports removing an arbitrary node - performance

This needs to be lock free as it has to run in the interrupt handler of an SMP system. I cannot take locks.
I have a contiguous array holding some values. Some of the entries in this array are "free", they are not occupied. I want to make a list of these entries so that I can quickly allocate one. However, I occasionally have to allocate an arbitrary entry.
Therefore, I see the following would be a nice way of doing things:
The contiguous array holds not just values but also left and right pointers, thus making a deque. Only free values have valid left/right pointers. I can quickly get to arbitrary nodes because it is just an index access into the deque.
Now, to the crux of it: Is there a nice lock free deque algorithm that is relatively efficient and can support the removal of an arbitrary node?

The contiguous array holds not just values but also left and right pointers, thus making
a deque.
[snip]
Now, to the crux of it: Is there a nice lock free deque algorithm that is relatively
efficient and can support the removal of an arbitrary node?
A deque with the ability to remove arbitrary elements is really a doubly linked list; the only thing you've given up is the ability to insert arbitrary elements and removing is the hard part - if you can remove, you can certainly add.
A lock-free doubly linked list exists, but it requires garbage collection.
How about this; have a freelist. This represents available nodes. The nodes are actually an array, so you can index into them. When you have to use an arbitrary node, index into the array and then CAS a flag in that element but leave it in the freelist (you have to of course, since it's not at the top of the freelist). When you come in future to pop and you find you pop an element which has already been used, just keep popping till you find one which is free.

In a garbage-collected system, it's possible to have a singly linked list support lock-free logical removal of items, if you don't care about when the memory for the items physically gets freed, and if it's not possible to add an item immediately following an item that's being deleted. Give each item a deleted flag, and have a list-traversal routine that will check as it visits each item whether the following node has been deleted; if it has, use compare and swap to swing the present node's "next" pointer around it. Note that it's possible that the "next" pointer of the node which was being deleted might get changed, but only to skip the node following it. It's possible that swinging a next pointer might cause a node which has just been unlinked from the list to get relinked (e.g. A->B->C->D might, if B and C are removed simultaneously, become A->B->D (swinging B
s pointer) and then A->C->D (swinging A's pointer to the latched value of B's 'next' pointer). If node C was and continues to be flagged as "deleted", however, that shouldn't pose a problem, since the next time the list is iterated, A's pointer will swing to D.
Two caveats: -1- In a non-garbage-collected system, it may be difficult to know when a node can really be freed; freeing a node and then swinging a pointer back to it could cause Undefined Behavior; -2- If a node is added immediately following a deleted node, a pointer may swing so as to disconnect the new node. If nodes will always be added to the end of the queue, one can avoid this latter problem by ending the queue with a dummy node, which cannot be deleted until there's another node following it.

Related

Best statically allocated data structure for writing and extending contiguous blocks of data?

Here's what I want to do:
I have an arbitrary number of values of a different kind: string, int, float, bool, etc. that I need to store somehow. Multiple elements are often written and read as a whole, forming "contiguous blocks" that can also be extended and shortened at the users wish and even elements in the middle might be taken out. Also, the whole thing should be statically allocated.
I was thinking about using some kind of statically allocated forward lists. The way I imagine this to work is defining an array of a struct containing one std::variant field and a field "previous head" which always points to the location of the previous head of the list. A new element is always placed at the globally known "head" which it stores inside "previous head" field. This way I can keep track of holes inside my list because once an element is taken out, its location is written to global head and will be filled up by subsequent inserts.
This approach however has downsides: When a "contiguous block" is extended, there might be the case that further elements of other blocks have already queued up in the list past its last element. So I either need to move all subsequent entries or copy over the last element in the previous list and insert a link object that allows me to jump to the new location when traversing the contiguous block.
The priority to optimize this datastructure is following (by number of use cases):
Initially write contigous blocks
read the whole data structure
add new elements to contigous blocks
remove elements of contigous blocks
At the moment my data structure will have time complexity of O(1) für writes, O(n) for continous reads (with the caveat that in the worst case there is a jump to the next location inside the array every other element), O(1) for adding new elements and O(1) for removing elements. However, space complexity is S(2n) in the worst case (when I have to do a jump every second time the slot to store data is lost to the "link").
What I'm wondering now is: Is the described way the best viable way to accomplish what I'm trying or is there a better data structure? Is there an official name for this data structure?

Container of fixed size where items exist within it according to their demand

I'm trying to implement a container with the following characteristics:
The container has a fixed size n.
When an item is inserted into the container, if the item is in the data structure it will be moved to the front of it. If not it will be inserted to the front of the data structure, but the last item at the back of the container will be removed to respect the fixed size n.
Building on for 2, it will be required to check whether an item exists in this container in order to know whether to insert or move an item in the container.
The reasoning behind this container, is to keep frequently accessed items in the container. The cost of inserting a new item into the container is large thus it is in my interest to keep it in the container for as long as it is in demand.
Is there a container/data structure that exists that achieves something similar to what I've described? If not can you provide any advice on how to implement it? I'm using C++ but any examples or pseudocode will be equally appreciated.
Edit:
I suppose what I need is a kind of queue with no duplicate items. The queue needs to be searched to see if an item exists within it, and if so moves it to the front of the queue. A fixed size isn't that difficult to adhere to (just check the size before insertion and if it will go over remove the last item in the queue). Basically this post but not allowing any duplicates in the container, and also fast search capabilities to check if an item is within it.
I'm not following the requirements you gave but this seems like it can be implemented as as a double-ended queue (C++ deque or Java Deque). Each time an element is accessed implies a linear search (can't be avoided), then this element is moved to the front (constant time), and the last element removed (also constant time). This should result that the most frequently accessed elements move to the front of the queue over time, decreasing the real-time cost of a linear search.
A double-ended queue can be implemented as a ring-buffer or as a doubly-linked-list. Since you stated a fixed number of elements, the ring buffer seems like the better option.
However, I can't vouch for the implementations of C++ or Java deque.. you may look at the source code to see if its backed as an array or a linked node structure.
Maybe wrap a priority queue with elements having a last-accessed-time attribute?
You may check Splay Tree. If you do some operation on element X, that element move to root.

Why we are not saving the parent pointer in "B+ Tree" for easy upward traversal in tree?

Will it affect much if I add a pointer to the parent Node to get simplicity during splitting and insertion process?
General Node would then look something like this :
class BPTreeNode{
bool leaf;
BPTreeNode *next;
BPTreeNode *parent; //add-on
std::vector < int* >pointers;
std::vector < int >keys;
};
What are the challenges I might get in real life database system since right now.
I am only implementing it as a hobby project.
There are two reasons I can think of:
The algorithm for deleting a value from a B+tree may result in an internal block A that has too few child blocks. If neither the block at the left or right of A can pass an entry to A in order to resolve this violation, then block A needs to merge into a sibling block B. This means that all the child blocks of block A need to have their parent pointer updated to block B. This is additional work that increases (a lot) the number of blocks that need an update in a delete-operation.
It represents extra space that is really not needed for performing the standard B+Tree operations. When searching a value via a B+Tree you can easily keep track of the path to that leaf level and use it for backtracking upwards in the tree.

What is the difference between linkedlist and queue?

I'm new to data structures and it seems like both data structures have more similarities.
In this answer it says that there is a difference in interface.
Please explain it.
A queue is any data structure that is "FIFO = First-In First-Out." It's a waiting-list. (In Britain, the term is used in ordinary conversation ... you "wait in a queue" not "wait in line.")
A stack is any data structure that is "LIFO = Last-In First-Out." It's a pushdown stack like the stack of dishes in a cafeteria.
Linked lists are a possible implementation of either such structure. It consists of nodes which contain pointers to adjacent nodes in the list.
However, there are many other implementations. "Trees" of various kinds can also be used to implement both queues and stacks. Ordinary arrays can do it, although of course arrays cannot "grow."
Ideally, these days, you simply use an appropriate "container class" in your favorite language and fuhgeddabout how it actually was implemented. "You know that it works, therefore you don't care how." Actual implementation of such things is probably an academic exercise.
List is just a list of things (items, objects whatever). For example a list of courses you are taking in your semester. A list of songs that you are listening. A list of answers of this question on this page. There is no order associate with a list. You can add an item to a list anywhere, you can take an item off the list from anywhere, it doesn't change the definition of a list. Its just a grouping of similar (or not so similar) items.
Now consider a list of people standing in front of ATM machine or a bank teller. This list has to observe a particular order. The first person in the line (list) is the one that will be served first (and will be the first to leave this list). A new person coming in will be standing as a last person in the queue and will be served after everyone in front of him has been served. The people in middle of the list are not supposed to jump the line. This is an example of a Queue. You can also guess what a priority Queue would be (think Airlines with silver and gold members on check-ins).
I hope this explains the difference.
A link list is a list of nodes. Each node contain an address field and that address field holding the address of its next node. The reason for this kind of structure is to traverse through the list from its first node till last node. This type of structure is called singly link list. A link list can also be doubly linked, in that structure a node will have two address field where one field will store the address of its previous node and one address will hold the address of its next node. Most important thing of a link list is that its first node address must be stored in an address variable so that we can traverse through the link list at any time.
But Queue can be a link list or an array of nodes. In a list a node can be insert at any place. But in queue a new node must be inserted at the beginning of the list. A queue is working on basis of FIFO i.e. first in first out basis. Hence when you use the pop command on a queue if it is a link list it must remove the last node of the list and return the value of that last node. Hence a queue can be a list as well but with a principle called FIFO based.
You will get more information online. Read properly and try to understand the difference.
I had the same question as you! This is what I found:
A Queue is essentially just more restrictive than a LinkedList. For example, in a LinkedList you can use the method .add(int index, Object obj), but if you try doing that with a Queue interface, you’ll get an error, since with a Queue you can only add elements at the tail end. Similarly, in a LinkedList you can use .remove(int index) as well as .remove(Object obj), but attempting to do this with a Queue will result in an error, since you can only remove an object from the head. So, in essence, the Queue just has less options when it comes to methods you can use on it. (There might be more to it than that, but that’s what was most pertinent to me.)
There are some similarities between the two. For example, they both have the .poll() method, and the result is the same: removes the head element from the Object.
Here are some links in which you can compare the methods of the two (scroll to the bottom of each page to see them all, and you’ll see immediately that LinkedList has a lot more):
https://www.geeksforgeeks.org/linked-list-in-java/ (LinkedList)
https://www.geeksforgeeks.org/queue-interface-java/ (Queue)
In Java (and probably other languages too), a LinkedList implements the Queue interface. So in essence, a LinkedList is a Queue; it has all features that a Queue does and more. Keep in mind, a Queue is not a LinkedList, as a LinkedList is built and expanded upon a Queue.
See this:

Efficient mass modification of persistent data structures

I understand how typically trees are used to modify persistent data structures (create a new node and replace all it's ancestors).
But what if I have a tree of 10,000's of nodes and I need to modify 1000's of them? I don't want to go through and create 1000's of new roots, I only need the one new root that results from modifying everything at once.
For example:
Let's take a persistent binary tree for example. In the single update node case, it does a search until it finds the node, creates a new one with the modifications and the old children, and creates new ancestors up to the root.
In the bulk update case could we do:
Instead of just updating a single node, you're going to update 1000 nodes on it in one pass.
At the root node, the current list is the full list. You then split that list between those that match the left node and those that match the right. If none match one of the children, don't descend to it. You then descend to the left node (assuming there were matches), split its search list between its children, and continue. When you have a single node and a match, you update it and go back up, replacing and updating ancestors and other branches as appropriate.
This would result in only one new root even though it modified any number of nodes.
These kind of "mass modification" operations are sometimes called bulk updates. Of course, the details will vary depending on exactly what kind of data structure you are working with and what kind of modifications you are trying to perform.
Typical kinds of operations might include "delete all values satisfying some condition" or "increment the values associated with all the keys in this list". Frequently, these operations can be performed in a single walk over the entire structure, taking O(n) time.
You seem to be concerned about the memory allocation involved in creating "1000's of new roots". Typical allocation for performing the operations one at a time would be O(k log n), where k is the number of nodes being modified. Typical allocation for performing the single walk over the entire structure would be O(n). Which is better depends on k and n.
In some cases, you can decrease the amount of allocation--at the cost of more complicated code--by paying special attention to when changes occur. For example, if you have a recursive algorithm that returns a tree, you might modify the algorithm to return a tree together with a boolean indicating whether anything has changed. The algorithm could then check those booleans before allocating a new node to see whether the old node can safely be reused. However, people don't usually bother with this extra check unless and until they have evidence that the extra memory allocation is actually a problem.
A particular implementation of what you're looking for can be found in Clojure's (and ClojureScript's) transients.
In short, given a fully-immutable, persistent data structure, a transient version of it will make changes using destructive (allocation-efficient) mutation, which you can flip back into a proper persistent data structure again when you're done with your performance-sensitive operations. It is only at the transition back to a persistent data structure that new roots are created (for example), thus amortizing the attendant cost over the number of logical operations you performed on the structure while it was in its transient form.

Resources