Implementing an Iterative Single Stack Binary Tree Copy Function - algorithm

As a thought exercise I am trying to implement an iterative tree (binary or binary search tree) copy function.
It is my understanding that it can be achieved trivially:
with a single stack
without using a wrapper (that contains references to the copy and original nodes)
without a node having a reference to it's parent (would a parent reference in a node be counter to a true definition of a tree [which I believe is a DAG]?)
I have written different implementations that meet the inverse of the above constraints but I am uncertain how to approach the problem with the constraints.
I did not see anything in Algorithms 4/e and have not seen anything online (beyond statements of how trivial it is). I considered using the concepts from in order and post order of a current/previous var but I did not see a way to track accurately when popping the stack. I also briefly considered a hash map but I feel this is still just extra storage like the extra stack.
Any help in understanding the concepts/idioms behind the approach that I am not seeing is gratefully received.
Thanks in advance.
Edit:
Some requests for what I've tried so far. Here is the 2 stack solution which I believe is supposed to be able to turn into the 1 stack the most trivially.
It's written in C++. I am new to the language (but not programming) and teaching myself using C++ Primer 5/e (Lippman, Lajole, Moo) [C++11] and the internet. If any of the code from a language perspective is wrong, please let me know (although I'm aware Code Review Stack Exchange is the place for an actual review).
I have a template Node that is used by other parts of the code.
template<typename T>
struct Node;
typedef Node<std::string> tree_node;
typedef std::shared_ptr<tree_node> shared_ptr_node;
template<typename T>
struct Node final {
public:
const T value;
const shared_ptr_node &left = m_left;
const shared_ptr_node &right = m_right;
Node(const T value, const shared_ptr_node left = nullptr, const shared_ptr_node right = nullptr) : value(value), m_left(left), m_right (right) {}
void updateLeft(const shared_ptr_node node) {
m_left = node;
}
void updateRight(const shared_ptr_node node) {
m_right = node;
}
private:
shared_ptr_node m_left;
shared_ptr_node m_right;
};
And then the 2 stack implementation.
shared_ptr_node iterativeCopy2Stacks(const shared_ptr_node &node) {
const shared_ptr_node newRoot = std::make_shared<tree_node>(node->value);
std::stack<const shared_ptr_node> s;
s.push(node);
std::stack<const shared_ptr_node> copyS;
copyS.push(newRoot);
shared_ptr_node original = nullptr;
shared_ptr_node copy = nullptr;
while (!s.empty()) {
original = s.top();
s.pop();
copy = copyS.top();
copyS.pop();
if (original->right) {
s.push(original->right);
copy->updateRight(std::make_shared<tree_node>(original->right->value));
copyS.push(copy->right);
}
if (original->left) {
s.push(original->left);
copy->updateLeft(std::make_shared<tree_node>(original->left->value));
copyS.push(copy->left);
}
}
return newRoot;
}

I'm not fluent in c++, so you'll have to settle with pseudocode:
node copy(treenode n):
if n == null
return null
node tmp = clone(n) //no deep clone!!!
stack s
s.push(tmp)
while !s.empty():
node n = s.pop()
if n.left != null:
n.left = clone(n.left)
s.push(n.left)
if n.right != null:
n.right = clone(n.right)
s.push(n.right)
return tmp
Note that clone(node) is not a deep-clone. The basic idea is to start with a shallow-clone of the root, then iterate over all children of that node and replace those nodes (still references to the original node) by shallow copies, replace those nodes children, etc.. This algorithm traverses the tree in a DFS-manner. In case you prefer BFS (for whatever reason) you could just replace the stack by a queue. Another advantage of this code: it can be altered with a few minor changes to work for arbitrary trees.
A recursive version of this algorithm (in case you prefer recursive code over my horrible prosa):
node copyRec(node n):
if n.left != null:
n.left = clone(n.left)
copyRec(n.left)
if n.right != null:
n.right = clone(n.right)
copyRec(n.right)
return n
node copy(node n):
return copyRec(clone(n))
EDIT:
If you want to have a look at working code, I've created an implementation in python.

Related

Postorder Traversal of Tree iterative method

I am trying to implement postorder traversal of tree using 2 stacks using iterative method. I have implementated the right algorithm. But still didn't getting output, getting error as, Segmentation fault (core dumped).
Where I have done wrong can anyone tell me ?
void postorder_iterative(struct node *root)
{
struct node *stack1[15],*stack2[15];
int top1=-1,top2 =-1;
root = stack1[++top1];
while(top1>=0)
{
root = stack1[top1--];
stack2[++top2] =root;
if(root->left != NULL)
stack1[++top1] = root->left;
if(root->right != NULL)
stack1[++top1] = root->right;
}
while(top2>=0)
printf("%c\t",stack2[top2--]->data);
}
You are reading an undefined value with this statement, in the first iteration of the loop:
root = stack1[top1--];
This first stack element is undefined, because you never initialised it. It was supposed to get initialised here:
root = stack1[++top1];
But this does not put anything in the stack. Instead it overwrites root with an undefined value.
It should have been the reversed:
stack1[++top1] = root;
This fixes the issue.
Don't forget to print a new line character once you have printed the list, so nothing is pending in the buffer.

Singly Linked List using shared_ptr

I was trying to implement singly linked list using share_ptr. Here is the implementation...
Below is the node class...
template<typename T>
class Node
{
public:
T value;
shared_ptr<Node<T>> next;
Node() : value(0), next(nullptr){};
Node(T value) : value(value), next(nullptr){};
~Node() { cout << "In Destructor: " << value << endl; };
};
Below is the linked list class...
template<typename T>
class LinkedList
{
private:
size_t m_size;
shared_ptr<Node<T>> head;
shared_ptr<Node<T>> tail;
public:
LinkedList() : m_size(0), head(nullptr) {};
void push_front(T value)
{
shared_ptr<Node<T>> temp = head;
head = make_shared<Node<T>>(Node<T>(value));
head->next = temp;
m_size++;
if (m_size == 1)
tail = head;
}
void pop_front()
{
if (m_size != 0)
{
// Here I am having doubt------------------------!!!
//shared_ptr<Node<T>> temp = head;
head = head->next;
m_size--;
if (m_size == 0)
tail = nullptr;
}
}
bool empty()
{
return (m_size == 0) ? true : false;
}
T front()
{
if (m_size != 0)
return head->value;
}
};
My question is, am I using the shared_ptr properly for allocating a node? If not, how should I use the shared_ptr to allocate and how should I delete the node in the pop_front method?
I believe this belongs on code review.
Most importantly: Why are you using shared_ptr? shared_ptr means the ownership of an object is unclear. This is not the case for linked lists: Every node owns the next. You can express that using unique_ptr which is easier and more efficient.
pop_front seems to be functioning correctly. You may consider throwing an exception or an assertion instead of doing nothing when using pop_front on an empty list.
front is more problematic. If the list is empty you most likely get a garbage object.
What is the significance of tail? It does not seem to be used for anything and since you cannot go backwards there is no real point to getting the tail.
make_shared<Node<T>>(Node<T>(value)) should be make_shared<Node<T>>(value) instead. make_shared<Node<T>>(value) creates a Node using value as the parameter for the constructor. make_shared<Node<T>>(Node<T>(value)) creates a Node with value as the parameter and then creates a new Node with the temporary Node as parameter and then destroys the first Node.
You are missing the copy and move constructor and assignment and move assignment operators.
After you are satisfied with your list implementation consider using std::forward_list instead.

Build trie faster

I'm making an mobile app which needs thousands of fast string lookups and prefix checks. To speed this up, I made a Trie out of my word list, which has about 180,000 words.
Everything's great, but the only problem is that building this huge trie (it has about 400,000 nodes) takes about 10 seconds currently on my phone, which is really slow.
Here's the code that builds the trie.
public SimpleTrie makeTrie(String file) throws Exception {
String line;
SimpleTrie trie = new SimpleTrie();
BufferedReader br = new BufferedReader(new FileReader(file));
while( (line = br.readLine()) != null) {
trie.insert(line);
}
br.close();
return trie;
}
The insert method which runs on O(length of key)
public void insert(String key) {
TrieNode crawler = root;
for(int level=0 ; level < key.length() ; level++) {
int index = key.charAt(level) - 'A';
if(crawler.children[index] == null) {
crawler.children[index] = getNode();
}
crawler = crawler.children[index];
}
crawler.valid = true;
}
I'm looking for intuitive methods to build the trie faster. Maybe I build the trie just once on my laptop, store it somehow to the disk, and load it from a file in the phone? But I don't know how to implement this.
Or are there any other prefix data structures which will take less time to build, but have similar lookup time complexity?
Any suggestions are appreciated. Thanks in advance.
EDIT
Someone suggested using Java Serialization. I tried it, but it was very slow with this code:
public void serializeTrie(SimpleTrie trie, String file) {
try {
ObjectOutput out = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
out.writeObject(trie);
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
public SimpleTrie deserializeTrie(String file) {
try {
ObjectInput in = new ObjectInputStream(new BufferedInputStream(new FileInputStream(file)));
SimpleTrie trie = (SimpleTrie)in.readObject();
in.close();
return trie;
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
return null;
}
}
Can this above code be made faster?
My trie: http://pastebin.com/QkFisi09
Word list: http://www.isc.ro/lists/twl06.zip
Android IDE used to run code: http://play.google.com/store/apps/details?id=com.jimmychen.app.sand
Double-Array tries are very fast to save/load because all data is stored in linear arrays. They are also very fast to lookup, but the insertions can be costly. I bet there is a Java implementation somewhere.
Also, if your data is static (i.e. you don't update it on phone) consider DAFSA for your task. It is one of the most efficient data structures for storing words (must be better than "standard" tries and radix tries both for size and for speed, better than succinct tries for speed, often better than succinct tries for size). There is a good C++ implementation: dawgdic - you can use it to build DAFSA from command line and then use a Java reader for the resulting data structure (example implementation is here).
You could store your trie as an array of nodes, with references to child nodes replaced with array indices. Your root node would be the first element. That way, you could easily store/load your trie from simple binary or text format.
public class SimpleTrie {
public class TrieNode {
boolean valid;
int[] children;
}
private TrieNode[] nodes;
private int numberOfNodes;
private TrieNode getNode() {
TrieNode t = nodes[++numberOnNodes];
return t;
}
}
Just build a large String[] and sort it. Then you can use binary search to find the location of a String. You can also do a query based on prefixes without too much work.
Prefix look-up example:
Compare method:
private static int compare(String string, String prefix) {
if (prefix.length()>string.length()) return Integer.MIN_VALUE;
for (int i=0; i<prefix.length(); i++) {
char s = string.charAt(i);
char p = prefix.charAt(i);
if (s!=p) {
if (p<s) {
// prefix is before string
return -1;
}
// prefix is after string
return 1;
}
}
return 0;
}
Finds an occurrence of the prefix in the array and returns it's location (MIN or MAX are mean not found)
private static int recursiveFind(String[] strings, String prefix, int start, int end) {
if (start == end) {
String lastValue = strings[start]; // start==end
if (compare(lastValue,prefix)==0)
return start; // start==end
return Integer.MAX_VALUE;
}
int low = start;
int high = end + 1; // zero indexed, so add one.
int middle = low + ((high - low) / 2);
String middleValue = strings[middle];
int comp = compare(middleValue,prefix);
if (comp == Integer.MIN_VALUE) return comp;
if (comp==0)
return middle;
if (comp>0)
return recursiveFind(strings, prefix, middle + 1, end);
return recursiveFind(strings, prefix, start, middle - 1);
}
Gets a String array and prefix, prints out occurrences of prefix in array
private static boolean testPrefix(String[] strings, String prefix) {
int i = recursiveFind(strings, prefix, 0, strings.length-1);
if (i==Integer.MAX_VALUE || i==Integer.MIN_VALUE) {
// not found
return false;
}
// Found an occurrence, now search up and down for other occurrences
int up = i+1;
int down = i;
while (down>=0) {
String string = strings[down];
if (compare(string,prefix)==0) {
System.out.println(string);
} else {
break;
}
down--;
}
while (up<strings.length) {
String string = strings[up];
if (compare(string,prefix)==0) {
System.out.println(string);
} else {
break;
}
up++;
}
return true;
}
Here's a reasonably compact format for storing a trie on disk. I'll specify it by its (efficient) deserialization algorithm. Initialize a stack whose initial contents are the root node of the trie. Read characters one by one and interpret them as follows. The meaning of a letter A-Z is "allocate a new node, make it a child of the current top of stack, and push the newly allocated node onto the stack". The letter indicates which position the child is in. The meaning of a space is "set the valid flag of the node on top of the stack to true". The meaning of a backspace (\b) is "pop the stack".
For example, the input
TREE \b\bIE \b\b\bOO \b\b\b
gives the word list
TREE
TRIE
TOO
. On your desktop, construct the trie using whichever method and then serialize by the following recursive algorithm (pseudocode).
serialize(node):
if node is valid: put(' ')
for letter in A-Z:
if node has a child under letter:
put(letter)
serialize(child)
put('\b')
This isn't a magic bullet, but you can probably reduce your runtime slightly by doing one big memory allocation instead of a bunch of little ones.
I saw a ~10% speedup in the test code below (C++, not Java, sorry) when I used a "node pool" instead of relying on individual allocations:
#include <string>
#include <fstream>
#define USE_NODE_POOL
#ifdef USE_NODE_POOL
struct Node;
Node *node_pool;
int node_pool_idx = 0;
#endif
struct Node {
void insert(const std::string &s) { insert_helper(s, 0); }
void insert_helper(const std::string &s, int idx) {
if (idx >= s.length()) return;
int char_idx = s[idx] - 'A';
if (children[char_idx] == nullptr) {
#ifdef USE_NODE_POOL
children[char_idx] = &node_pool[node_pool_idx++];
#else
children[char_idx] = new Node();
#endif
}
children[char_idx]->insert_helper(s, idx + 1);
}
Node *children[26] = {};
};
int main() {
#ifdef USE_NODE_POOL
node_pool = new Node[400000];
#endif
Node n;
std::ifstream fin("TWL06.txt");
std::string word;
while (fin >> word) n.insert(word);
}
Tries that prealloate space all possible children (256) have a huge amount of wasted space. You are making your cache cry. Store those pointers to children in a resizable data structure.
Some tries will optimize by having one node to represent a long string, and break that string up only when needed.
Instead of a simple file you can use a database like sqlite and a nested set or celko tree to store the trie and you can also build a faster and shorter (less nodes) trie with a ternary search trie.
I don't like the idea of addressing nodes by index in array, but only because it requires one more addition (index to the pointer). But with array of preallocated nodes you will maybe save some time on allocation and initialization. And you can also save a lot of space by reserving first 26 indices for leaf nodes. Thus you'll not need to allocate and initialize 180000 leaf nodes.
Also with indices you will be able to read the prepared nodes array from disk in binary format. This has to be several times faster. But I'm not sure how to do this on your language. Is this Java?
If you checked that your source vocabulary is sorted, you may also save some time by comparing some prefix of the current string with the previous one. E.g. first 4 characters. If they are equal you can start your
for(int level=0 ; level < key.length() ; level++) {
loop from the 5-th level.
Is it space inefficient or time inefficient? If you are rolling a plain trie then space may be part of the problem when dealing with a mobil device. Check out patricia/radix tries, especially if you are using it as a prefix look-up tool.
Trie:
http://en.wikipedia.org/wiki/Trie
Patricia/Radix trie:
http://en.wikipedia.org/wiki/Radix_tree
You didn't mention a language but here are two implementations of prefix tries in Java.
Regular trie:
http://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/Trie.java
Patricia/Radix (space-effecient) trie:
http://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/PatriciaTrie.java
Generally speaking, avoid using a lot of object creations from scratch in Java, which is both slow and it also has a massive overhead. Better implement your own pooling class for memory management that allocates e.g. half a million entries at a time in one go.
Also, serialization is too slow for large lexicons. Use a binary read to populate array-based representations proposed above quickly.

Dynamics AX 2012 not releasing memory

I am experimenting with exporting the AOT CUS layer as an XPO file. Reference this question. I've run the excellent suggestions from the answers to said question but I am running into "out of memory" issues. I've done some further research and some additional experimentation. Here is a sample of the algorithm I am using to climb down the AOT tree and export only nodes that belong to the "CUS" layer.
private void GetAOLHelper(TreeNode baseNode, str baseExportDirectory, int currentLevel, int maxLevel)
{
int cusLayerTest;
int CusLayerValue = 4096;
str ExportFileName = "";
str ExportDirectoryName = "";
TreeNode nextNode;
if (baseNode != null)
{
cusLayerTest = CusLayerValue & baseNode.applObjectLayerMask();
if (cusLayerTest > 0)
{
ExportFileName = baseNode.AOTname() + ".xpo";
this.NodeExport(baseNode, baseExportDirectory, ExportFileName);
}
else
{
if (currentLevel < maxLevel)
{
nextNode = baseNode.AOTfirstChild();
while (nextNode != null)
{
this.GetAOLHelper(nextNode, baseExportDirectory, currentLevel + 1, maxLevel);
nextNode = nextNode.AOTnextSibling();
}
nextNode = null;
}
}
}
}
The crux of this algorithm is as follows: I want to climb down the AOT tree (starting at a particular node) and export any layer that is a "CUS" layer object. I stop climbing down the tree at "maxlevel", meaning I only go X levels deep into the tree. I'm currently only running this algorithm on the "Data Dictionary" node of the AOT tree.
The issue I'm facing is that when this job runs, the memory footprint of the AX32.exe process is almost 1 GB. If I run this code against multiple nodes the memory requirement keeps climbing. I'm curious as to why AX is not releasing the memory when the algorithm is finished. My research on Google is coming up with some issues with the AX garbage collection. I'd like to know if there is a way to force garbage collection in AX? If I attempt to export every node in the AOT I run into the aforementioned "Out Of Memory" exception. The memory will not be released until I close the AX32.exe client.
TreeNode objects are not garbage collected like most other object. You have to release it yourself. Call treeNodeRelease()when you're done with a node.

Linux Kernel - Red/Black Trees

I'm trying to implement a red/black tree in Linux per task_struct using code from linux/rbtree.h. I can get a red/black tree inserting properly in a standalone space in the kernel such as a module but when I try to get the same code to function with the rb_root declared in either task_struct or task_struct->files_struct, I get a SEGFAULT everytime I try an insert.
Here's some code:
In task_struct I create a rb_root struct for my tree (not a pointer).
In init_task.h, macro INIT_TASK(tsk), I set this equal to RB_ROOT.
To do an insert, I use this code:
rb_insert(&(current->fd_tree), &rbnode);
This is where the issue occurs.
My insert command is the standard insert that is documented in all RBTree documentation for the kernel:
int my_insert(struct rb_root *root, struct mytype *data)
{
struct rb_node **new = &(root->rb_node), *parent = NULL;
/* Figure out where to put new node */
while (*new) {
struct mytype *this = container_of(*new, struct mytype, node);
int result = strcmp(data->keystring, this->keystring);
parent = *new;
if (result < 0)
new = &((*new)->rb_left);
else if (result > 0)
new = &((*new)->rb_right);
else
return FALSE;
}
/* Add new node and rebalance tree. */
rb_link_node(&data->node, parent, new);
rb_insert_color(&data->node, root);
return TRUE;
}
Is there something I'm missing?
Some reason this would work fine if I made a tree root outside of task_struct? If I make rb_root inside of a module this insert works fine. But once I put the actual tree root in the task_struct or even in the task_struct->files_struct, I get a SEGFAULT. Can a root node not be added in these structs?
Any tips are greatly appreciated. I've tried nearly everything I can think of.
Edit:
I get a SEGFAULT on the following line when trying to print and any line that accesses the tree. With this line you should get the understanding of how I'm handling the pointers. rb_entry and rb_first are methods already available in the kernel. current is a pointer to a task struct (current working process) and tree is my root node (not a pointer) which is a member of the task struct (I added). rb_first needs to pass a pointer *rb_root. I'm doing this wrong.
printk(KERN_CRIT "node=%d\n", rb_entry(rb_first(&(current->tree)), struct rb_tree_struct, node)->fd_key);
Could it be the pointer values of root and/or data aren't what you expect? It might be useful to add
printk("%s: root=%p data=%p\n", __func__, root, data);
before the while() loop.

Resources