How to implement a Tree data structure with multiple roots - data-structures

I need to implement a Tree data structure which should have multiple roots, not just 1 root. Look at this scenario, suppose I have to implement Tree data structure for "Book contents". Which are "Chapters > Sections > Sub-Sections" etc. The major problem is: There are multiple roots here, chapter 1, chapter 2, chapter 3 and so on. The root node must definitely start from chapters, since the type of content and functions are same starting from those level.
What my Task requires:
Tree with multiple roots
The Nodes are Ordered on horizontal level among same parent
It is a non-binary tree, meaning there can be any number of roots and any number of childs.
I have come with a solution, but I think it is a messy approach. I made one class like one would normally do for tree data structure. This class is "SimpleTree" which works for a single chapter as root node. To make multiple root nodes possible, I made another class "TopWrapperForSimpleTree". This top wrapper class has an Array in order to store multiple "SimpleTree" elements to it (Basically multiple roots). The messy part here is that I have to copy each function of "SimpleTree" and define it for the wrapper class as well. For example, a "Traversal Function" would traverse all the elements in the "SimpleTree". But now I have to implement a "Traversal Function" for "TopWrapperForSimpleTree" class as well where it would have to loop through all the Roots calling Traversal function on each of them and concatenating the result. The same goes for other functions like, finding a node, deleting a node etc.
To sum it all, I need a Tree Data structure which can have multiple roots. It should be ordered as well. The order is very important.
Image showing Tree with multiple roots

A "tree with multiple roots" is not a tree. When you consider each chapter to be a tree, then the collection of chapters is a forest. But you could just add a root node. Chapters belong to a Book, and the Book is then the root node.
You don't need a data structure for multiple roots. You need a data structure where nodes are multi-functional and can represent a book, a chapter, a section, ...etc without having to duplicate code. OOP is perfect for that (inheritance).
The idea is to define a class that has all the common features that all objects have in common. For instance, a book, a chapter, a section, ... all have a name, and they all can have "children". Iteration of a tree could be implemented as a method of this class.
Then a book would be an extension of this base class: a book can for instance have an author property. A section would also be an extension of the base class, but could have as extra property a page number. A chapter could be an extension of a section, as it also has a page number, but may in addition have a chapter number, ...etc.
Here is one of the many ways to do that. I use JavaScript here, but it works in a similar way in other OOP languages:
class Node {
constructor(name) {
this.name = name;
this.children = [];
}
add(...children) {
this.children.push(...children);
return this; // Return the instance, so method calls can be chained...
}
toString() {
return this.name;
}
* iter(depth=0) {
// A pre-order iteration through the whole tree that this node represents
yield [depth, this];
for (let child of this.children) {
yield * child.iter(depth+1);
}
}
* iterWithout(depth=0) {
// A pre-order iteration through the whole tree that this node represents
// ...but excluding the node on which the original call is made:
for (let child of this.children) {
yield [depth, child];
yield * child.iterWithout(depth+1);
}
}
}
class Book extends Node {
constructor(name, author) {
super(name);
this.author = author; // specific property for Book instance
}
toString() {
return super.toString() + ", by " + this.author;
}
}
class Section extends Node {
constructor(name, page) {
super(name);
this.page = page; // specific property for any section (also chapter)
}
toString() {
return super.toString() + ", page " + this.page;
}
}
class Chapter extends Section {
constructor(id, name, page) {
super(name, page);
this.id = id; // specific property for Chapter instance
}
toString() {
return "Chapter " + this.id + ". " + super.toString();
}
}
// Illustration of how it could be used:
function main() { // Demo
let book = new Book("The Perfect Theory", "Pedro G. Ferreira").add(
new Chapter(1, "If a Person Falls Freely", 1).add(
new Section("The Autumn of 1907", 1),
new Section("The Article in the Yearbook", 4),
new Section("Isaac Newton", 6),
new Section("Gravity", 9),
),
new Chapter(2, "The Most Valuable Discovery", 12),
new Chapter(3, "Correct Mathematics, Abominable Physics", 28),
new Chapter(4, "Collapsing Stars", 47)
);
for (let [depth, item] of book.iterWithout()) {
console.log(" ".repeat(depth) + item.toString());
}
}
main();

Related

How do you access/modify specific elements in a List in Haxe?

I am new to Haxe and I am looking for an equivalent data structure to Java's ArrayList, which is resizable and indexed. Haxe's List only allows access to its first and last element.
This is my use case:
I have a class Deck which represents a deck of 52 playing cards. The objects of type Card are stored in a List<Card>.
class Deck {
var cards:List<Card>;
public function new() {}
public function init() {
cards = new List<Card>();
for (i in 1...5) { // iterate over suits
for (j in 1...14) { // iterate over values
cards.add(new Card(j, SuitFunctions.toSuit(i)));
}
}
}
}
Now I want to implement a function shuffle which shuffles the cards.
public function shuffle() {
var j:Int, k:Int;
var c:Card;
for (i in 1...1000000) {
j = Std.random(cards.length);
k = Std.random(cards.length);
c = getCardAt(j);
setCardAt(j, getCardAt(k));
setCardAt(k, c);
}
}
But Lists in Haxe are not indexed. How would I implement the functions getCardAt(Int) and setCardAt(Int, Card)? This is what the signatures should look like:
function getCardAt(i:Int):Card {
// ToDo
return new Card(0, Suit.ERROR);
}
function setCardAt(i:Int, c:Card) {
// ToDo
}
Alternatively, is there a different data structure that fits this scenario better? Are for example Arrays resizable and indexed in Haxe?
Best regards.
I am looking for an equivalent data structure to Java's ArrayList, which is resizable and indexed.
Array is what you want, they are "dynamic arrays", so they resize automatically.
List is a linked list, and it's not used much.

Build tree structure from the records

I query database for records in structure as follows
ID | Term | ParentID
In C# code I have following class
public class Tree
{
public string Id { get; set; }
public string Term { get; set; }
public string ParentId { get; set; }
public int Level { get; set; }
public IList<Tree> ChildItems { get; set; }
}
Query returns 5 000 000 records.
I need to build tree of Tree items and populate it.
First at all, I select all items where ParentID is null, and then for every element search parent (if parent doesn't exist I build parent of the parent and so on) and build tree using recursion.
I'm not happy with my algorithm because It takes more than 5 minutes.
Please, let me some advice how to do that, what to use and so on.
This is how the code is now implemented:
private string Handle2(List<Tree> originalTree)
{
IList<Tree> newTree = new List<TreeTest.Models.Tree>();
IList<Tree> treeWithoutParents = originalTree.Where(x => String.IsNullOrEmpty(x.ParentID)).OrderBy(x => x.Term).ToList();
foreach(Tree item in treeWithoutParents)
{
Tree newItem = new Tree { Id = item.ID, Term = item.Term, ParentId = item.ParentID, Level = 0 };
newTree.Add(newItem);
InsertChilds(newItem, originalTree, 0);
}
return "output";
}
private void InsertChilds(Tree item, List<Tree> origTree, int level)
{
++level;
IList<Tree> childItems = origTree.Where(x => x.ParentID == item.Id).ToList();
origTree.RemoveAll(x => x.ParentID == item.Id);
foreach (Tree i in childItems)
{
origTree.Remove(i);
}
foreach (Tree tItem in childItems)
{
if (item.ChildTree == null)
{
item.ChildTree = new List<TreeTest.Models.Tree>();
}
Tree itemToAdd = new Tree { Id = tItem.ID, Term = tItem.Term, ParentId = tItem.ParentID, Level = level };
this.InsertChilds(itemToAdd, origTree, level);
item.ChildTree.Add(itemToAdd);
}
}
Try using a map (C# Dictionary) of ID (string, although I'm curious why that isn't int) to node (Tree object) to store your tree nodes.
This would allow you to get the node corresponding to an ID with expected O(1) complexity, rather than your current O(n) complexity.
Beyond that, I suggest you rethink your approach a bit - try to write code which involves you only going through the input data once, just use a single Dictionary - if the parent doesn't exist yet, you could just create a filler-item for the parent, which has its members populated only when you get to that item.
I would use a dictionary (hash table) to make this faster. Here is my algorithm in pseudocode:
- create a dictionary mapping ID to IList<Tree> // mapping a node to its children
- create Queue<string,string> of IDs //item (3,5) in the queue corresponds to a node with ID=3 that has a parent with ID=5
- initialize the queue with all the codes with no parent
- List<Tree> withoutParent = dictionary[null]
- for each item in withoutParent:
- add (item.Id, null) to the queue
- while queue is not empty:
- (id,pid) = delete an item from the queue
- make a new node t
- t.Id = id
- t.parentId = pid
- t.ChildItems = dictionary[id]
- for each child in t.ChildItems:
- add (child.Id, id) to the queue
is the column ID a unique identifier. If it is then you can try the following. Instead of using a List, use a Set or a hashmap. This is because if a parent has too many child, lookup in a list can slow down your operations. If you use a Set, you can do a quick lookup and you can also do a quick addition of your elements.
Also, can you check how much time an order by clause will take . This might really help you speed up your process. If ID is a clustered index, you will get a fast sort by(as the data is already sorted) , else your query will still use the same index
When a parent does not exist , you are creating a parent of a parent. I would try to avoid that. What you can do is in case a child's parent does not exist in the tree, add it to a separate list. After you have gone through the original list , make a second pass to find orphaned elements. The advantage is that you do not need to resize your tree every time you create a parent of parent and then find out that the parent was just at the end of the list

Applications of linked lists

What are some good examples of an application of a linked list? I know that it's a good idea to implement queues and stacks as linked lists, but is there a practical and direct example of a linked list solving a problem that specifically takes advantage of fast insert time? Not just other data structures based on linked lists.
Hoping for answers similar to this question about priority queues: Priority Queue applications
I have found one myself: A LRU (least recently used) cache implemented with a hash table and a linked list.
There's also the example of the Exception class having an InnerExeption
What else is there?
I work as a developer at a "large stock market" in the US. Part of what makes us operate at very fast speed is we don't do any heap allocation/de-allocation after initialization (before the start of the day on the market). This technique isn't unique to exchanges, it's also common in most real time systems.
First of all, for us, Linked lists are preferred to array based lists because they do not require heap allocation when the list grows or shrinks. We use linked lists in multiple applications on the exchange.
One application is to pre-allocate all objects into pools (which are linked lists) during initialization; so whenever we need a new object we can just remove the head of the list.
Another application is in order processing; every Order object implements a linked list entry interface (has a previous and next reference), so when we receive an order from a customer, we can remove an Order object from the pool and put it into a "to process" list. Since every Order object implements a Linked List entry, adding at any point in the list is as easy as populating a previous and next references.
Example off the top of my head:
Interface IMultiListEntry {
public IMultiListEntry getPrev(MultiList list);
public void setPrev(MultiList list, IMultiListEntry entry);
public IMultiListEntry getNext(MultiList list);
public void setNext(MultiList list, IMultiListEntry entry);
}
Class MultiListEntry implements IMultiListEntry {
private MultiListEntry[] prev = new MultiListEntry[MultiList.MAX_LISTS];
private MultiListEntry[] next = new MultiListEntry[MultiList.MAX_LISTS];
public MultiListEntry getPrev(MultiList list) {
return prev[list.number];
}
public void setPrev(MultiList list, IMultiListEntry entry) {
prev[list.number] = entry;
}
public IMultiListEntry getNext(MultiList list) {
return next[list.number];
}
public void setNext(MultiList list, IMultiListEntry entry) {
next[list.number] = entry;
}
}
Class MultiList {
private static int MAX_LISTS = 3;
private static int LISTS = 0;
public final int number = LISTS++;
private IMultiListEntry head = null;
private IMultiListEntry tail = null;
public IMultiListEntry getHead() {
return head;
}
public void add(IMultiListEntry entry) {
if (head==null) {
head = entry;
} else {
entry.setPrevious(this, tail);
tail.setNext(this, entry);
}
tail = entry;
}
public IMultiListEntry getPrev(IMultiListEntry entry) {
return entry.getPrev(this);
}
public IMultiListEntry getNext(IMultiListEntry entry) {
return entry.getNext(this);
}
}
Now all you have to do is either extend MultiListEntry or implement IMultiListEntry and delegate the interface methods to an internal reference to a MultiListEntry object.
The answer could be infinitely many and "good example" is a subjective term, so the answer to your question is highly debatable. Of course there are examples. You just have to think about the possible needs of fast insertion.
For example you have a task list and you have to solve all the tasks. When you go through the list, when a task is solved you realize that a new task has to be solved urgently so you insert the task after the task you just solved. It is not a queue, because the list might be needed in the future for reviewing, so you need to keep your list intact, no pop method is allowed in this case.
To give you another example: You have a set of names ordered in alphabetical order. Let's suppose that somehow you can determine quickly the object which has its next pointing to the object where a particular name is stored. If you want to quickly delete a name, you just go to the previous item of the object to be deleted. Deletion is also quicker than in the case of stacks or queues.
Finally, imagine a very big set of items which needs to be stored even after your insertion or deletion. In this case it is far more quicker to just search for the item to be deleted or the item before the position where your item should be inserted and then do your operation than copy your whole large set.
hashmaps in java uses link list representation.
When more than one key hashes on the same place it results in collision and at that time keys are chained like link list.

String set implementation

I have to implement a set ADT for a pair of strings. The interface I want is (in Java):
public interface StringSet {
void add(String a, String b);
boolean contains(String a, String b);
void remove(String a, String b);
}
The data access pattern has the following properties:
The contains operation is far more frequent that the add and remove ones.
More often that not, contains returns true i.e. the search is successful
A simple implementation I can think of is to use a two-level hashtable, i.e. HashMap<String, HashMap<String, Boolean>>. But this datastructure makes no use of the two peculiarities of the access pattern. I am wondering if there is something more efficient than the hashtable, maybe by leveraging the access pattern peculiarities.
Personally, I would design this in terms of a standard Set<> interface:
public class StringPair {
public StringPair(String a, String b) {
a_ = a;
b_ = b;
hash_ = (a_ + b_).hashCode();
}
public boolean equals(StringPair pair) {
return (a_.equals(pair.a_) && b_.equals(pair.b_));
}
#Override
public boolean equals(Object obj) {
if (obj instanceof StringPair) {
return equals((StringPair) obj);
}
return false;
}
#Override
public int hashCode() {
return hash_;
}
private String a_;
private String b_;
private int hash_;
}
public class StringSetImpl implements StringSet {
public StringSetImpl(SetFactory factory) {
pair_set_ = factory.createSet<StringPair>();
}
// ...
private Set<StringPair> pair_set_ = null;
}
Then you could leave it up to the user of StringSetImpl to use the preferred Set type. If you are attempting to optimize access, though, it's hard to do better than a HashSet<> (at least with respect to runtime complexity), given that access is O(1), whereas tree-based sets have O(log N) access times.
That contains() usually returns true may make it worth considering a Bloom filter, although this would require that some number of false positives for contains() are allowed (don't know if that is the case).
Edit
To avoid the extra allocation, you can do something like this, which is similar to your two-level approach, except using a set rather than a map for the second level:
public class StringSetImpl implements StringSet {
public StringSetImpl() {
elements_ = new HashMap<String, Set<String>>();
}
public boolean contains(String a, String b) {
if (!elements_.containsKey(a)) {
return false;
}
Set<String> set = elements_.get(a);
if (set == null) {
return false;
}
return set.contains(b);
}
public void add(String a, String b) {
if (!elements_.containsKey(a) || elements_.get(a) == null) {
elements_.put(a, new HashSet<String>());
}
elements_.get(a).add(b);
}
public void remove(String a, String b) {
if (!elements_.containsKey(a)) {
return;
}
HashSet<String> set = elements_.get(a);
if (set == null) {
elements_.remove(a);
return a;
}
set.remove(b);
if (set.empty()) {
elements_.remove(a);
}
}
private Map<String, Set<String>> elements_ = null;
}
Since it's 4:20 AM where I'm located, the above is definitely not my best work (too tired to refresh myself on the treatment of null by these different collections types), but it sketches the approach.
Do not use normal trees (most standard library data structures) for this. There is one simple assumption, which will hurt you in this case:
The normal O(log(n)) calculation of operations on trees assume that comparisons are in O(1). This is true for integers and most other keys, but not for strings. In case of strings each comparison is on O(k) where k is the length of the string. This makes all operations dependent on the length, which will most likely hurt you if you need to be fast and is easily overlooked.
Especially if you most often return true there will be k comparisons for each string at each level, so with this access pattern you will experience the full drawback of strings in trees.
Your access pattern is easily handled by a Trie. Testing if a string is contained is in O(k) worst case (not average case as in a hash map). Adding a string is is also in O(k). Since you are storing two strings I would suggest, you don't index your trie by characters, but rather by some larger type, so you can add two special index values. One value for the end of the first string, and one value for the end of both strings.
In your case using these two extra symbols would also allow for simple removal: Just delete the final node containing the end symbol and your string will not be found anymore. You will waste some memory, because you still have the strings in your structure that have been deleted. In case this is a problem you could keep track of the number of deleted strings and rebuild your trie in case this get's to bad.
P.s. A trie can be thought of as a combination of a tree and several hashtables, so this gives you the best of both data structures.
I'd second the approach of Michael Aaron Safyan to use a StringPair type. Perhaps with a more specific name, or as a generic tuple type: Tuple<A,B> instantiated to Tuple<String,String>. But I would strongly suggest to use one of the provided set implementations, either a HashSet or a TreeSet.
Red-Black Tree implementation of the set would be a good option. C++ STL is implemented in Red-Black Tree

Resolving class inheritance algorithm

I have a list of pairs with class inheritance information like this
[
[Person, null],
[Person, SpecialPerson], // Person extends SpecialPerson
[SpecialPerson, VerySpecialPerson], // SpecialPerson extends VerySpecialPerson
]
Is there any particular algorithm to flatten this information?
Like this:
Person -> SpecialPerson -> VerySpecialPerson
In the end, it boils down to a DAG (directed acyclic graph). Therefore you would do a breadth-first search or depth-first search. You only need the simplified case for trees.
Example (BFS, pseudo-code, untested):
List<Array<Typespec>> flatten(Array<Pair<Typespec,Typespec>> input) {
List<Array<Typespec>> result;
Queue<Array<Typespec>*> q;
var elem=&result.append([null]);
q.append(elem);
while (!q.empty()) {
for (i in input) {
if (i.first==q.front().back()) {
var elem=&result.append(q.front().clone().append(i.second));
q.append(elem);
}
}
q.pop_front();
}
return result;
}
This assumes that you meant [null,Person], instead of the other way round. Note that it produces a null at the start of every result, differing from your example.

Resources