Data structure that shares nodes with an existing tree - data-structures

Basically, if I remember correctly, it's a tree. Then, if you apply a transformation on the tree, the unchanged nodes are shared, and new ones are created.
However, due to the way it is made, one can still access the data before each transformation, and so with the least memory used.
I think this is applicable in a context like git, and I specifically remember that git was the example used for this structure, but I can't seem to remember its name. Anyone help?
I've tried searching for tree variants, and I just can't seem to find it.

Related

Can CustomListItems explore a tree?

I would like to use a .natvis file to display all of the elements in a tree structure as a flat list.
I'm aware of the TreeItems expansion described in the .natvis documentation, and have tried to use that, but it seems a bit limited, and difficult to get to work in my situation (Templated node types, nodes which don't contain the actual data, need to cast the nodes to other templated types to get the data, lack of ability to do Exec entries or multiple Items per tree node, etc.)
The CustomListItems expansion is a lot more flexible in terms of being able to assign variables and execute commands which are evaluated in a loop. But I'm struggling to figure out whether there's a way to explore a tree in CustomListItems. Basically there would have to be some way to maintain a stack of some kind so that I could explore the entire left branch and then pop back up and explore the right branch. Is there any way to do this in CustomListItems?
https://learn.microsoft.com/en-ca/previous-versions/visualstudio/visual-studio-2015/debugger/create-custom-views-of-native-objects?view=vs-2015&redirectedfrom=MSDN#natvis-views

What alternative to bitmap for network connectivity check?

I have a set of nodes, each identified by a unique integer (UID), which are arranged into one or more networks. Each network has one or more root nodes. I need to know which nodes are not connected to any root nodes.
At present, from a previous product iteration, my connectivity check starts at each root node and follows all connections. For every found node, a bit in a bitmap is set so that you can quickly check if a node has already been found / processed. Once all paths for all root nodes have been followed, the complete set of nodes is compared against the 'found' bitmap to show all the unconnected nodes.
Previously, UIDs were sequential and I could consolidate them to remove gaps. So using the first ID as an offset, I just made my found array quite large and indexed found nodes into the bitmap directly using the UID (i.e., if I found node 1000, I'd set the 1000th bit in the bitmap). However, with the current version of the product, I have less control over the node UIDs. I can't consolidate them, and third party interaction is unpredictably making very large gaps (e.g., UIDs might jump from being in the thousands to being in the tens of millions). I have come across instances where my bitmap array is too small to accommodate the gaps and the connectivity check fails.
Now, I could just go on increasing the size of my bitmap array, but that always runs the risk of still being too small and is not very resource efficient. Thus, I'm looking to replace it with something else. Obviously, I'd like it to be as fast and as resource efficient as possible - I think some sort of hashed map is what I need. Unfortunately, I have to make this work in Fortran, so I don't have access to <map> etc.
What is the best way to hash my UIDs into a lookup structure, such that I can easily check if I already found that node?
If you can modify the node types themselves, you could add a found field, and use that?
If that is not possible, then yes, a hash table sounds like the obvious solution. Depending on how the UID's are distributed, you might get away with something as simple as mod(UID, bitmap_size). But, you still need to handle collisions somehow. There's lot of literature on that topic, so I won't go into that here, except to note that robinhood hashing is pretty cool (except maybe a bit complicated for a one-off use).

Barnes-Hut tree creating

I am currently trying to create a Barnes-Hut octree, however, I still not fully understand how to do this properly. I have read threads here, this article and some others. I believe I do understand how to make a tree if every node contains the information about the indices of particles inside, and if you keep storing the empty nodes. But if you do not want to? How to make a tree such that at the end you will only have necessary information: say, monopoles and quadrupoles for all non-empty nodes. I made so many different attempts that now I am completely confused, to be honest. What should I contain in each node? What would be the pseudocode for such thing?
P.S. By the way, is it different for monopoles and quadrupoles? I mean I can imagine that you do not need the exact information about the particles inside the node to calculate a monopole (it is just a full mass of node), but for quadruple?
Thank you in advance!
P.S. By the way, I use julia language if it is somehow relevant.

Efficient view updating with functional data model

In functional programming, data models are immutable, and updating a data model is done by applying a function on the data model, and getting a new version of the data model in return. I'm wondering how people write efficient viewers/editors for such data models, though (more specifically in Clojure)
A simplified example: suppose that you want to implement a viewer for a huge tree. In the non-functional world, you could have a controller for the Tree, with a function updateNode(Node, Value), which could then notify all observers to tell them that a specific node in the tree has been updated. On the viewer side, you would put all the nodes in a TreeView widget, keep a mapping of Node->WidgetNode, and when you are notified that a Node has changed, you can update just the one corresponding NodeWidget in the tree that needs updating.
The solution described in another Clojure MVC question talks about keeping the model in a ref, and adding a watcher. While this would indeed allow you to get notified of a change in the model, you still wouldn't know which node was updated, and would have to traverse the whole tree, correct?
The best thing I can come up with from the top of my head requires you to in the worst case update all the nodes on the path from root to the changed node (as all these nodes will be different)
What is the standard solution for updating views on immutable data models?
I'm not sure how this is a problem that's unique to functional programming. If you kept all of your state in a singly rooted mutable object graph with a notify when it changed, the same issue would exist.
To get around this, you could simply store the current state of model, and some information about what changed for the last edit. You could even keep a history of these things to allow for easy undo/redo because Clojure's persistent data structures make that extremely efficient with their shared underlying state.
That's just one thought on how to attack it. I'm sure there are many more.
I also think it's worth asking, "How efficient does it need to be?" The answer is, "just efficient enough for the circumstances." It might be the the plain map of data will work because you don't really have all that much data to deal with in a given application.

Store hierarchies in a way that is resistant to corruption

I was thinking today about the best way to store a hierarchical set of nodes, e.g.
(source: www2002.org)
The most obvious way to represent this (to me at least) would be for each node to have nextSibling and childNode pointers, either of which could be null.
This has the following properties:
Requires a small number of changes if you want to add in or remove a node somewhere
Is highly susceptible to corruption. If one node was lost, you could potentially lose a large amount of other nodes that were dependent on being found through that node's pointers.
Another method you might use is to come up with a system of coordinates, e.g. 1.1, 1.2, 1.2.1, 1.2.2. 1.2.3 would be the 3rd node at the 3rd level, with the 2nd node at the prior level as its parent. Unanticipated loss of a node would not affect the ability to resolve any other nodes. However, adding in a node somewhere has the potential effect of changing the coordinates for a large number of other nodes.
What are ways that you could store a hierarchy of nodes that requires few changes to add or delete a node and is resilient to corruption of a few nodes? (not implementation-specific)
When you refer to corruption, are you talking about RAM or some other storage? Perhaps during transmission over some medium?
In any case, when you are dealing with data corruption you are talking about an entire field of computer science that deals with error detection and correction.
When you talk about losing a node, the first thing you have to figure out is 'how do I know I've lost a node?', that's error detection.
As for the problem of protecting data from corruption, pretty much the only way to do this is with redundancy. The amount of redundancy is determined by what limit you want to put on the degree of corruption you would like to be able to recover from. You can't really protect yourself from this with a clever structure design as you are just as likely to suffer corruption to the critical 'clever' part of your structure :)
The ever-wise wikipedia is a good place to start: Error detection and correction
I was thinking today about the best way to store a hierarchical set of nodes
So you are writing a filesystem? ;-)
The most obvious way to represent this (to me at least) would be for each node to have nextSibling and childNode pointers
Why? The sibling information is present at the parent node, so all you need is a pointer back to the parent. A doubly linked-list, so to speak.
What are ways that you could store a hierarchy of nodes that requires few changes to add or delete a node and is resilient to corruption of a few nodes?
There are actually two different questions involved here.
Is the data corrupt?
How do I fix corrupt data (aka self healing systems)?
Answers to these two questions will determine the exact nature of the solution.
Data Corruption
If your only aim is to know if your data is good or not, store a hash digest of child node information with the parent.
Self Healing Structures
Any self healing structure will need the following information:
Is there a corruption? (See above)
Where is the corruption?
Can it be healed?
Different algorithms exist to fix data with varying degree of effectiveness. The root idea is to introduce redundancy. Reconstruction depends on your degree of redundancy. Since the best guarantees are given by the most redundant systems -- you'll have to choose.
I believe there is some scope of narrowing down your question to a point where we can start discussing individual bits and pieces of the puzzle.
A simple option is to store reference to root node in every node - this way it is easy to detect orphan nodes.
Another interesting option is to store hierarchy information as a descendants (transitive closure) table. So for 1.2.3 node you'd have the following relations:
1., 1.2.3. - root node is ascendant of 1.2.3.
1.2., 1.2.3. - 1.2. node is ascendant of 1.2.3.
1., 1.2. - root node is ascendant of 1.2.
etc...
This table can be more resistant to errors as it holds some redundant info.
Goran
The typical method to store a hierarchy, is to have a ParentNode property/field in every node. For root the ParentNode is null, for all other nodes it has a value. This does mean that the tree may lose entire branches, but in memory that seems unlikely, and in a DB you can guard against that using constraints.
This approach doesn't directly support the finding of all siblings, if that is a requirement I'd add another property/field for depth, root has depth 0, all nodes below root have depth 1, and so on. All nodes with the same depth are siblings.

Resources