Related
As a thought exercise I am trying to implement an iterative tree (binary or binary search tree) copy function.
It is my understanding that it can be achieved trivially:
with a single stack
without using a wrapper (that contains references to the copy and original nodes)
without a node having a reference to it's parent (would a parent reference in a node be counter to a true definition of a tree [which I believe is a DAG]?)
I have written different implementations that meet the inverse of the above constraints but I am uncertain how to approach the problem with the constraints.
I did not see anything in Algorithms 4/e and have not seen anything online (beyond statements of how trivial it is). I considered using the concepts from in order and post order of a current/previous var but I did not see a way to track accurately when popping the stack. I also briefly considered a hash map but I feel this is still just extra storage like the extra stack.
Any help in understanding the concepts/idioms behind the approach that I am not seeing is gratefully received.
Thanks in advance.
Edit:
Some requests for what I've tried so far. Here is the 2 stack solution which I believe is supposed to be able to turn into the 1 stack the most trivially.
It's written in C++. I am new to the language (but not programming) and teaching myself using C++ Primer 5/e (Lippman, Lajole, Moo) [C++11] and the internet. If any of the code from a language perspective is wrong, please let me know (although I'm aware Code Review Stack Exchange is the place for an actual review).
I have a template Node that is used by other parts of the code.
template<typename T>
struct Node;
typedef Node<std::string> tree_node;
typedef std::shared_ptr<tree_node> shared_ptr_node;
template<typename T>
struct Node final {
public:
const T value;
const shared_ptr_node &left = m_left;
const shared_ptr_node &right = m_right;
Node(const T value, const shared_ptr_node left = nullptr, const shared_ptr_node right = nullptr) : value(value), m_left(left), m_right (right) {}
void updateLeft(const shared_ptr_node node) {
m_left = node;
}
void updateRight(const shared_ptr_node node) {
m_right = node;
}
private:
shared_ptr_node m_left;
shared_ptr_node m_right;
};
And then the 2 stack implementation.
shared_ptr_node iterativeCopy2Stacks(const shared_ptr_node &node) {
const shared_ptr_node newRoot = std::make_shared<tree_node>(node->value);
std::stack<const shared_ptr_node> s;
s.push(node);
std::stack<const shared_ptr_node> copyS;
copyS.push(newRoot);
shared_ptr_node original = nullptr;
shared_ptr_node copy = nullptr;
while (!s.empty()) {
original = s.top();
s.pop();
copy = copyS.top();
copyS.pop();
if (original->right) {
s.push(original->right);
copy->updateRight(std::make_shared<tree_node>(original->right->value));
copyS.push(copy->right);
}
if (original->left) {
s.push(original->left);
copy->updateLeft(std::make_shared<tree_node>(original->left->value));
copyS.push(copy->left);
}
}
return newRoot;
}
I'm not fluent in c++, so you'll have to settle with pseudocode:
node copy(treenode n):
if n == null
return null
node tmp = clone(n) //no deep clone!!!
stack s
s.push(tmp)
while !s.empty():
node n = s.pop()
if n.left != null:
n.left = clone(n.left)
s.push(n.left)
if n.right != null:
n.right = clone(n.right)
s.push(n.right)
return tmp
Note that clone(node) is not a deep-clone. The basic idea is to start with a shallow-clone of the root, then iterate over all children of that node and replace those nodes (still references to the original node) by shallow copies, replace those nodes children, etc.. This algorithm traverses the tree in a DFS-manner. In case you prefer BFS (for whatever reason) you could just replace the stack by a queue. Another advantage of this code: it can be altered with a few minor changes to work for arbitrary trees.
A recursive version of this algorithm (in case you prefer recursive code over my horrible prosa):
node copyRec(node n):
if n.left != null:
n.left = clone(n.left)
copyRec(n.left)
if n.right != null:
n.right = clone(n.right)
copyRec(n.right)
return n
node copy(node n):
return copyRec(clone(n))
EDIT:
If you want to have a look at working code, I've created an implementation in python.
Can anyone help shed some light on why the below code consumes well over 100 MB of RAM during runtime?
public struct Trie<Element : Hashable> {
private var children: [Element:Trie<Element>]
private var endHere : Bool
public init() {
children = [:]
endHere = false
}
public init<S : SequenceType where S.Generator.Element == Element>(_ seq: S) {
self.init(gen: seq.generate())
}
private init<G : GeneratorType where G.Element == Element>(var gen: G) {
if let head = gen.next() {
(children, endHere) = ([head:Trie(gen:gen)], false)
} else {
(children, endHere) = ([:], true)
}
}
private mutating func insert<G : GeneratorType where G.Element == Element>(var gen: G) {
if let head = gen.next() {
let _ = children[head]?.insert(gen) ?? { children[head] = Trie(gen: gen) }()
} else {
endHere = true
}
}
public mutating func insert<S : SequenceType where S.Generator.Element == Element>(seq: S) {
insert(seq.generate())
}
}
var trie = Trie<UInt32>()
for i in 0..<300000 {
trie.insert([UInt32(i), UInt32(i+1), UInt32(i+2)])
}
Based on my calculations total memory consumption for the above data structure should be somewhere around the following...
3 * count * sizeof(Trie<UInt32>)
Or –
3 * 300,000 * 9 = 8,100,000 bytes = ~8 MB
How is it that this data structure consumes well over 100 MB during runtime?
sizeof reports only the static footprint on the stack, which the Dictionary is just kind of a wrapper of the reference to its internal reference type implementation, and also the copy on write support. In other words, the key-value pairs and the hash table of your dictionary are allocated on the heap, which is not covered by sizeof. This applies to all other Swift collection types.
In your case, you are creating three Trie - and indirectly three dictionaries - every iteration of the 300000. I wouldn't be surprised if the 96-byte allocations mentioned by #Macmade is the minimum overhead of a dictionary (e.g. its hash bucket).
There might also be cost related to growing storage. So you may try to see if setting a minimumCapacity on the dictionary would help. On the other hand, if you do not need a divergent path generated per iteration, you may consider an indirect enum as an alternative, e.g.
public enum Trie<Element> {
indirect case Next(Element, Trie<Element>)
case End
}
which should use less memory.
Size of your struct is 9 bytes, not 5.
You can check it with sizeof:
let size = sizeof( Trie< UInt32 > );
Also, you iterate 300'000 times, but insert 3 values (of course, it's a trie). So that's 900'000.
Anyway, that does not explain by itself the memory consumption you are observing.
I'm not really fluent in Swift, and I don't understand you code.
Maybe there's also some error in it, making it allocate more memory than needed.
But anyway, in order to understand what's happening, you need to run your code in Instruments (command-i).
On my machine, I can see 900'000 96 bytes allocations by swift_slowAlloc.
That's more like it...
Why 96 bytes, assuming there's no error in your code?
Well, it might be because of the way memory is allocated for your elements.
When satisfying a request, the memory allocator may allocate more memory than requested. That may be because it needs some internal metadata, because of paging, because of alignment, ...
But even though, it seems really exaggerated, so use instruments and double check what your code is doing.
The following code, which maps simple value holders to booleans, runs over 20x faster in Java than Swift 2 - XCode 7 beta3, "Fastest, Aggressive Optimizations [-Ofast]", and "Fast, Whole Module Optimizations" turned on. I can get over 280M lookups/sec in Java but only about 10M in Swift.
When I look at it in Instruments I see that most of the time is going into a pair of retain/release calls associated with the map lookup. Any suggestions on why this is happening or a workaround would be appreciated.
The structure of the code is a simplified version of my real code, which has a more complex key class and also stores other types (though Boolean is an actual case for me). Also, note that I am using a single mutable key instance for the retrieval to avoid allocating objects inside the loop and according to my tests this is faster in Swift than an immutable key.
EDIT: I have also tried switching to NSMutableDictionary but when used with Swift objects as keys it seems to be terribly slow.
EDIT2: I have tried implementing the test in objc (which wouldn't have the Optional unwrapping overhead) and it is faster but still over an order of magnitude slower than Java... I'm going to pose that example as another question to see if anyone has ideas.
EDIT3 - Answer. I have posted my conclusions and my workaround in an answer below.
public final class MyKey : Hashable {
var xi : Int = 0
init( _ xi : Int ) { set( xi ) }
final func set( xi : Int) { self.xi = xi }
public final var hashValue: Int { return xi }
}
public func == (lhs: MyKey, rhs: MyKey) -> Bool {
if ( lhs === rhs ) { return true }
return lhs.xi==rhs.xi
}
...
var map = Dictionary<MyKey,Bool>()
let range = 2500
for x in 0...range { map[ MyKey(x) ] = true }
let runs = 10
for _ in 0...runs
{
let time = Time()
let reps = 10000
let key = MyKey(0)
for _ in 0...reps {
for x in 0...range {
key.set(x)
if ( map[ key ] == nil ) { XCTAssertTrue(false) }
}
}
print("rate=\(time.rate( reps*range )) lookups/s")
}
and here is the corresponding Java code:
public class MyKey {
public int xi;
public MyKey( int xi ) { set( xi ); }
public void set( int xi) { this.xi = xi; }
#Override public int hashCode() { return xi; }
#Override
public boolean equals( Object o ) {
if ( o == this ) { return true; }
MyKey mk = (MyKey)o;
return mk.xi == this.xi;
}
}
...
Map<MyKey,Boolean> map = new HashMap<>();
int range = 2500;
for(int x=0; x<range; x++) { map.put( new MyKey(x), true ); }
int runs = 10;
for(int run=0; run<runs; run++)
{
Time time = new Time();
int reps = 10000;
MyKey buffer = new MyKey( 0 );
for (int it = 0; it < reps; it++) {
for (int x = 0; x < range; x++) {
buffer.set( x );
if ( map.get( buffer ) == null ) { Assert.assertTrue( false ); }
}
}
float rate = reps*range/time.s();
System.out.println( "rate = " + rate );
}
After much experimentation I have come to some conclusions and found a workaround (albeit somewhat extreme).
First let me say that I recognize that this kind of very fine grained data structure access within a tight loop is not representative of general performance, but it does affect my application and I'm imagining others like games and heavily numeric applications. Also let me say that I know that Swift is a moving target and I'm sure it will improve - perhaps my workaround (hacks) below will not be necessary by the time you read this. But if you are trying to do something like this today and you are looking at Instruments and seeing the majority of your application time spent in retain/release and you don't want to rewrite your entire app in objc please read on.
What I have found is that almost anything that one does in Swift that touches an object reference incurs an ARC retain/release penalty. Additionally Optional values - even optional primitives - also incur this cost. This pretty much rules out using Dictionary or NSDictionary.
Here are some things that are fast that you can include in a workaround:
a) Arrays of primitive types.
b) Arrays of final objects as long as long as the array is on the stack and not on the heap. e.g. Declare an array within the method body (but outside of your loop of course) and iteratively copy the values to it. Do not Array(array) copy it.
Putting this together you can construct a data structure based on arrays that stores e.g. Ints and then store array indexes to your objects in that data structure. Within your loop you can look up the objects by their index in the fast local array. Before you ask "couldn't the data structure store the array for me" - no, because that would incur two of the penalties I mentioned above :(
All things considered this workaround is not too bad - If you can enumerate the entities that you want to store in the Dictionary / data structure you should be able to host them in an array as described. Using the technique above I was able to exceed the Java performance by a factor of 2x in Swift in my case.
If anyone is still reading and interested at this point I will consider updating my example code and posting.
EDIT: I'd add an option: c) It is also possible to use UnsafeMutablePointer<> or Unmanaged<> in Swift to create a reference that will not be retained when passed around. I was not aware of this when I started and I would hesitate to recommend it in general because it's a hack, but I've used it in a few cases to wrap a heavily used array that was incurring a retain/release every time it was referenced.
I'm trying to write an implementation of union-find in Rust. This is famously very simple to implement in languages like C, while still having a complex run time analysis.
I'm having trouble getting Rust's mutex semantics to allow iterative hand-over-hand locking.
Here's how I got where I am now.
First, this is a very simple implementation of part of the structure I want in C:
#include <stdlib.h>
struct node {
struct node * parent;
};
struct node * create(struct node * parent) {
struct node * ans = malloc(sizeof(struct node));
ans->parent = parent;
return ans;
}
struct node * find_root(struct node * x) {
while (x->parent) {
x = x->parent;
}
return x;
}
int main() {
struct node * foo = create(NULL);
struct node * bar = create(foo);
struct node * baz = create(bar);
baz->parent = find_root(bar);
}
Note that the structure of the pointers is that of an inverted tree; multiple pointers may point at a single location, and there are no cycles.
At this point, there is no path compression.
Here is a Rust translation. I chose to use Rust's reference-counted pointer type to support the inverted tree type I referenced above.
Note that this implementation is much more verbose, possibly due to the increased safety that Rust offers, but possibly due to my inexperience with Rust.
use std::rc::Rc;
struct Node {
parent: Option<Rc<Node>>
}
fn create(parent: Option<Rc<Node>>) -> Node {
Node {parent: parent.clone()}
}
fn find_root(x: Rc<Node>) -> Rc<Node> {
let mut ans = x.clone();
while ans.parent.is_some() {
ans = ans.parent.clone().unwrap();
}
ans
}
fn main() {
let foo = Rc::new(create(None));
let bar = Rc::new(create(Some(foo.clone())));
let mut prebaz = create(Some(bar.clone()));
prebaz.parent = Some(find_root(bar.clone()));
}
Path compression re-parents each node along a path to the root every time find_root is called. To add this feature to the C code, only two new small functions are needed:
void change_root(struct node * x, struct node * root) {
while (x) {
struct node * tmp = x->parent;
x->parent = root;
x = tmp;
}
}
struct node * root(struct node * x) {
struct node * ans = find_root(x);
change_root(x, ans);
return ans;
}
The function change_root does all the re-parenting, while the function root is just a wrapper to use the results of find_root to re-parent the nodes on the path to the root.
In order to do this in Rust, I decided I would have to use a Mutex rather than just a reference counted pointer, since the Rc interface only allows mutable access by copy-on-write when more than one pointer to the item is live. As a result, all of the code would have to change. Before even getting to the path compression part, I got hung up on find_root:
use std::sync::{Mutex,Arc};
struct Node {
parent: Option<Arc<Mutex<Node>>>
}
fn create(parent: Option<Arc<Mutex<Node>>>) -> Node {
Node {parent: parent.clone()}
}
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
let mut inner = ans.lock();
while inner.parent.is_some() {
ans = inner.parent.clone().unwrap();
inner = ans.lock();
}
ans.clone()
}
This produces the error (with 0.12.0)
error: cannot assign to `ans` because it is borrowed
ans = inner.parent.clone().unwrap();
note: borrow of `ans` occurs here
let mut inner = ans.lock();
What I think I need here is hand-over-hand locking. For the path A -> B -> C -> ..., I need to lock A, lock B, unlock A, lock C, unlock B, ... Of course, I could keep all of the locks open: lock A, lock B, lock C, ... unlock C, unlock B, unlock A, but this seems inefficient.
However, Mutex does not offer unlock, and uses RAII instead. How can I achieve hand-over-hand locking in Rust without being able to directly call unlock?
EDIT: As the comments noted, I could use Rc<RefCell<Node>> rather than Arc<Mutex<Node>>. Doing so leads to the same compiler error.
For clarity about what I'm trying to avoid by using hand-over-hand locking, here is a RefCell version that compiles but used space linear in the length of the path.
fn find_root(x: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut inner : RefMut<Node> = x.borrow_mut();
if inner.parent.is_some() {
find_root(inner.parent.clone().unwrap())
} else {
x.clone()
}
}
We can pretty easily do full hand-over-hand locking as we traverse this list using just a bit of unsafe, which is necessary to tell the borrow checker a small bit of insight that we are aware of, but that it can't know.
But first, let's clearly formulate the problem:
We want to traverse a linked list whose nodes are stored as Arc<Mutex<Node>> to get the last node in the list
We need to lock each node in the list as we go along the way such that another concurrent traversal has to follow strictly behind us and cannot muck with our progress.
Before we get into the nitty-gritty details, let's try to write the signature for this function:
fn find_root(node: Arc<Mutex<Node>>) -> Arc<Mutex<Node>>;
Now that we know our goal, we can start to get into the implementation - here's a first attempt:
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
// We have to separate this from incoming since the lock must
// be borrowed from incoming, not this local node.
let mut node = incoming.clone();
let mut lock = incoming.lock();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
node = lock.parent.as_ref().unwrap().clone(); // !! uh-oh !!
lock = node.lock();
}
node
}
If we try to compile this, rustc will error on the line marked !! uh-oh !!, telling us that we can't move out of node while lock still exists, since lock is borrowing node. This is not a spurious error! The data in lock might go away as soon as node does - it's only because we know that we can keep the data lock is pointing to valid and in the same memory location even if we move node that we can fix this.
The key insight here is that the lifetime of data contained within an Arc is dynamic, and it is hard for the borrow checker to make the inferences we can about exactly how long data inside an Arc is valid.
This happens every once in a while when writing rust; you have more knowledge about the lifetime and organization of your data than rustc, and you want to be able to express that knowledge to the compiler, effectively saying "trust me". Enter: unsafe - our way of telling the compiler that we know more than it, and it should allow us to inform it of the guarantees that we know but it doesn't.
In this case, the guarantee is pretty simple - we are going to replace node while lock still exists, but we are not going to ensure that the data inside lock continues to be valid even though node goes away. To express this guarantee we can use mem::transmute, a function which allows us to reinterpret the type of any variable, by just using it to change the lifetime of the lock returned by node to be slightly longer than it actually is.
To make sure we keep our promise, we are going to use another handoff variable to hold node while we reassign lock - even though this moves node (changing its address) and the borrow checker will be angry at us, we know it's ok since lock doesn't point at node, it points at data inside of node, whose address (in this case, since it's behind an Arc) will not change.
Before we get to the solution, it's important to note that the trick we are using here is only valid because we are using an Arc. The borrow checker is warning us of a possibly serious error - if the Mutex was held inline and not in an Arc, this error would be a correct prevention of a use-after-free, where the MutexGuard held in lock would attempt to unlock a Mutex which has already been dropped, or at least moved to another memory location.
use std::mem;
use std::sync::{Arc, Mutex};
fn find_root(incoming: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut node = incoming.clone();
let mut handoff_node;
let mut lock = incoming.lock().unwrap();
// Could use while let but that leads to borrowing issues.
while lock.parent.is_some() {
// Keep the data in node around by holding on to this `Arc`.
handoff_node = node;
node = lock.parent.as_ref().unwrap().clone();
// We are going to move out of node while this lock is still around,
// but since we kept the data around it's ok.
lock = unsafe { mem::transmute(node.lock().unwrap()) };
}
node
}
And, just like that, rustc is happy, and we have hand-over-hand locking, since the last lock is released only after we have acquired the new lock!
There is one unanswered question in this implementation which I have not yet received an answer too, which is whether the drop of the old value and assignment of a new value to a variable is a guaranteed to be atomic - if not, there is a race condition where the old lock is released before the new lock is acquired in the assignment of lock. It's pretty trivial to work around this by just having another holdover_lock variable and moving the old lock into it before reassigning, then dropping it after reassigning lock.
Hopefully this fully addresses your question and shows how unsafe can be used to work around "deficiencies" in the borrow checker when you really do know more. I would still like to want that the cases where you know more than the borrow checker are rare, and transmuting lifetimes is not "usual" behavior.
Using Mutex in this way, as you can see, is pretty complex and you have to deal with many, many, possible sources of a race condition and I may not even have caught all of them! Unless you really need this structure to be accessible from many threads, it would probably be best to just use Rc and RefCell, if you need it, as this makes things much easier.
I believe this to fit the criteria of hand-over-hand locking.
use std::sync::Mutex;
fn main() {
// Create a set of mutexes to lock hand-over-hand
let mutexes = Vec::from_fn(4, |_| Mutex::new(false));
// Lock the first one
let val_0 = mutexes[0].lock();
if !*val_0 {
// Lock the second one
let mut val_1 = mutexes[1].lock();
// Unlock the first one
drop(val_0);
// Do logic
*val_1 = true;
}
for mutex in mutexes.iter() {
println!("{}" , *mutex.lock());
}
}
Edit #1
Does it work when access to lock n+1 is guarded by lock n?
If you mean something that could be shaped like the following, then I think the answer is no.
struct Level {
data: bool,
child: Option<Mutex<Box<Level>>>,
}
However, it is sensible that this should not work. When you wrap an object in a mutex, then you are saying "The entire object is safe". You can't say both "the entire pie is safe" and "I'm eating the stuff below the crust" at the same time. Perhaps you jettison the safety by creating a Mutex<()> and lock that?
This is still not the answer your literal question of to how to do hand-over-hand locking, which should only be important in a concurrent setting (or if someone else forced you to use Mutex references to nodes). It is instead how to do this with Rc and RefCell, which you seem to be interested in.
RefCell only allows mutable writes when one mutable reference is held. Importantly, the Rc<RefCell<Node>> objects are not mutable references. The mutable references it is talking about are the results from calling borrow_mut() on the Rc<RefCell<Node>>object, and as long as you do that in a limited scope (e.g. the body of the while loop), you'll be fine.
The important thing happening in path compression is that the next Rc object will keep the rest of the chain alive while you swing the parent pointer for node to point at root. However, it is not a reference in the Rust sense of the word.
struct Node
{
parent: Option<Rc<RefCell<Node>>>
}
fn find_root(mut node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>>
{
while let Some(parent) = node.borrow().parent.clone()
{
node = parent;
}
return node;
}
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>)
{
while node.borrow().parent.is_some()
{
let next = node.borrow().parent.clone().unwrap();
node.borrow_mut().parent = Some(root.clone());
node = next;
}
}
This runs fine for me with the test harness I used, though there may still be bugs. It certainly compiles and runs without a panic! due to trying to borrow_mut() something that is already borrowed. It may actually produce the right answer, that's up to you.
On IRC, Jonathan Reem pointed out that inner is borrowing until the end of its lexical scope, which is too far for what I was asking. Inlining it produces the following, which compiles without error:
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
while ans.lock().parent.is_some() {
ans = ans.lock().parent.clone().unwrap();
}
ans
}
EDIT: As Francis Gagné points out, this has a race condition, since the lock doesn't extend long enough. Here's a modified version that only has one lock() call; perhaps it is not vulnerable to the same problem.
fn find_root(x: Arc<Mutex<Node>>) -> Arc<Mutex<Node>> {
let mut ans = x.clone();
loop {
ans = {
let tmp = ans.lock();
match tmp.parent.clone() {
None => break,
Some(z) => z
}
}
}
ans
}
EDIT 2: This only holds one lock at a time, and so is racey. I still don't know how to do hand-over-hand locking.
As pointed out by Frank Sherry and others, you shouldn't use Arc/Mutex when single threaded. But his code was outdated, so here is the new one (for version 1.0.0alpha2).
This does not take linear space either (like the recursive code given in the question).
struct Node {
parent: Option<Rc<RefCell<Node>>>
}
fn find_root(node: Rc<RefCell<Node>>) -> Rc<RefCell<Node>> {
let mut ans = node.clone(); // Rc<RefCell<Node>>
loop {
ans = {
let ans_ref = ans.borrow(); // std::cell::Ref<Node>
match ans_ref.parent.clone() {
None => break,
Some(z) => z
}
} // ans_ref goes out of scope, and ans becomes mutable
}
ans
}
fn path_compress(mut node: Rc<RefCell<Node>>, root: Rc<RefCell<Node>>) {
while node.borrow().parent.is_some() {
let next = {
let node_ref = node.borrow();
node_ref.parent.clone().unwrap()
};
node.borrow_mut().parent = Some(root.clone());
// RefMut<Node> from borrow_mut() is out of scope here...
node = next; // therefore we can mutate node
}
}
Note for beginners: Pointers are automatically dereferenced by dot operator. ans.borrow() actually means (*ans).borrow(). I intentionally used different styles for the two functions.
Although not the answer to your literal question (hand-over locking), union-find with weighted-union and path-compression can be very simple in Rust:
fn unionfind<I: Iterator<(uint, uint)>>(mut iterator: I, nodes: uint) -> Vec<uint>
{
let mut root = Vec::from_fn(nodes, |x| x);
let mut rank = Vec::from_elem(nodes, 0u8);
for (mut x, mut y) in iterator
{
// find roots for x and y; do path compression on look-ups
while (x != root[x]) { root[x] = root[root[x]]; x = root[x]; }
while (y != root[y]) { root[y] = root[root[y]]; y = root[y]; }
if x != y
{
// weighted union swings roots
match rank[x].cmp(&rank[y])
{
Less => root[x] = y,
Greater => root[y] = x,
Equal =>
{
root[y] = x;
rank[x] += 1
},
}
}
}
}
Maybe the meta-point is that the union-find algorithm may not be the best place to handle node ownership, and by using references to existing memory (in this case, by just using uint identifiers for the nodes) without affecting the lifecycle of the nodes makes for a much simpler implementation, if you can get away with it of course.
I'm making an mobile app which needs thousands of fast string lookups and prefix checks. To speed this up, I made a Trie out of my word list, which has about 180,000 words.
Everything's great, but the only problem is that building this huge trie (it has about 400,000 nodes) takes about 10 seconds currently on my phone, which is really slow.
Here's the code that builds the trie.
public SimpleTrie makeTrie(String file) throws Exception {
String line;
SimpleTrie trie = new SimpleTrie();
BufferedReader br = new BufferedReader(new FileReader(file));
while( (line = br.readLine()) != null) {
trie.insert(line);
}
br.close();
return trie;
}
The insert method which runs on O(length of key)
public void insert(String key) {
TrieNode crawler = root;
for(int level=0 ; level < key.length() ; level++) {
int index = key.charAt(level) - 'A';
if(crawler.children[index] == null) {
crawler.children[index] = getNode();
}
crawler = crawler.children[index];
}
crawler.valid = true;
}
I'm looking for intuitive methods to build the trie faster. Maybe I build the trie just once on my laptop, store it somehow to the disk, and load it from a file in the phone? But I don't know how to implement this.
Or are there any other prefix data structures which will take less time to build, but have similar lookup time complexity?
Any suggestions are appreciated. Thanks in advance.
EDIT
Someone suggested using Java Serialization. I tried it, but it was very slow with this code:
public void serializeTrie(SimpleTrie trie, String file) {
try {
ObjectOutput out = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
out.writeObject(trie);
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
public SimpleTrie deserializeTrie(String file) {
try {
ObjectInput in = new ObjectInputStream(new BufferedInputStream(new FileInputStream(file)));
SimpleTrie trie = (SimpleTrie)in.readObject();
in.close();
return trie;
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
return null;
}
}
Can this above code be made faster?
My trie: http://pastebin.com/QkFisi09
Word list: http://www.isc.ro/lists/twl06.zip
Android IDE used to run code: http://play.google.com/store/apps/details?id=com.jimmychen.app.sand
Double-Array tries are very fast to save/load because all data is stored in linear arrays. They are also very fast to lookup, but the insertions can be costly. I bet there is a Java implementation somewhere.
Also, if your data is static (i.e. you don't update it on phone) consider DAFSA for your task. It is one of the most efficient data structures for storing words (must be better than "standard" tries and radix tries both for size and for speed, better than succinct tries for speed, often better than succinct tries for size). There is a good C++ implementation: dawgdic - you can use it to build DAFSA from command line and then use a Java reader for the resulting data structure (example implementation is here).
You could store your trie as an array of nodes, with references to child nodes replaced with array indices. Your root node would be the first element. That way, you could easily store/load your trie from simple binary or text format.
public class SimpleTrie {
public class TrieNode {
boolean valid;
int[] children;
}
private TrieNode[] nodes;
private int numberOfNodes;
private TrieNode getNode() {
TrieNode t = nodes[++numberOnNodes];
return t;
}
}
Just build a large String[] and sort it. Then you can use binary search to find the location of a String. You can also do a query based on prefixes without too much work.
Prefix look-up example:
Compare method:
private static int compare(String string, String prefix) {
if (prefix.length()>string.length()) return Integer.MIN_VALUE;
for (int i=0; i<prefix.length(); i++) {
char s = string.charAt(i);
char p = prefix.charAt(i);
if (s!=p) {
if (p<s) {
// prefix is before string
return -1;
}
// prefix is after string
return 1;
}
}
return 0;
}
Finds an occurrence of the prefix in the array and returns it's location (MIN or MAX are mean not found)
private static int recursiveFind(String[] strings, String prefix, int start, int end) {
if (start == end) {
String lastValue = strings[start]; // start==end
if (compare(lastValue,prefix)==0)
return start; // start==end
return Integer.MAX_VALUE;
}
int low = start;
int high = end + 1; // zero indexed, so add one.
int middle = low + ((high - low) / 2);
String middleValue = strings[middle];
int comp = compare(middleValue,prefix);
if (comp == Integer.MIN_VALUE) return comp;
if (comp==0)
return middle;
if (comp>0)
return recursiveFind(strings, prefix, middle + 1, end);
return recursiveFind(strings, prefix, start, middle - 1);
}
Gets a String array and prefix, prints out occurrences of prefix in array
private static boolean testPrefix(String[] strings, String prefix) {
int i = recursiveFind(strings, prefix, 0, strings.length-1);
if (i==Integer.MAX_VALUE || i==Integer.MIN_VALUE) {
// not found
return false;
}
// Found an occurrence, now search up and down for other occurrences
int up = i+1;
int down = i;
while (down>=0) {
String string = strings[down];
if (compare(string,prefix)==0) {
System.out.println(string);
} else {
break;
}
down--;
}
while (up<strings.length) {
String string = strings[up];
if (compare(string,prefix)==0) {
System.out.println(string);
} else {
break;
}
up++;
}
return true;
}
Here's a reasonably compact format for storing a trie on disk. I'll specify it by its (efficient) deserialization algorithm. Initialize a stack whose initial contents are the root node of the trie. Read characters one by one and interpret them as follows. The meaning of a letter A-Z is "allocate a new node, make it a child of the current top of stack, and push the newly allocated node onto the stack". The letter indicates which position the child is in. The meaning of a space is "set the valid flag of the node on top of the stack to true". The meaning of a backspace (\b) is "pop the stack".
For example, the input
TREE \b\bIE \b\b\bOO \b\b\b
gives the word list
TREE
TRIE
TOO
. On your desktop, construct the trie using whichever method and then serialize by the following recursive algorithm (pseudocode).
serialize(node):
if node is valid: put(' ')
for letter in A-Z:
if node has a child under letter:
put(letter)
serialize(child)
put('\b')
This isn't a magic bullet, but you can probably reduce your runtime slightly by doing one big memory allocation instead of a bunch of little ones.
I saw a ~10% speedup in the test code below (C++, not Java, sorry) when I used a "node pool" instead of relying on individual allocations:
#include <string>
#include <fstream>
#define USE_NODE_POOL
#ifdef USE_NODE_POOL
struct Node;
Node *node_pool;
int node_pool_idx = 0;
#endif
struct Node {
void insert(const std::string &s) { insert_helper(s, 0); }
void insert_helper(const std::string &s, int idx) {
if (idx >= s.length()) return;
int char_idx = s[idx] - 'A';
if (children[char_idx] == nullptr) {
#ifdef USE_NODE_POOL
children[char_idx] = &node_pool[node_pool_idx++];
#else
children[char_idx] = new Node();
#endif
}
children[char_idx]->insert_helper(s, idx + 1);
}
Node *children[26] = {};
};
int main() {
#ifdef USE_NODE_POOL
node_pool = new Node[400000];
#endif
Node n;
std::ifstream fin("TWL06.txt");
std::string word;
while (fin >> word) n.insert(word);
}
Tries that prealloate space all possible children (256) have a huge amount of wasted space. You are making your cache cry. Store those pointers to children in a resizable data structure.
Some tries will optimize by having one node to represent a long string, and break that string up only when needed.
Instead of a simple file you can use a database like sqlite and a nested set or celko tree to store the trie and you can also build a faster and shorter (less nodes) trie with a ternary search trie.
I don't like the idea of addressing nodes by index in array, but only because it requires one more addition (index to the pointer). But with array of preallocated nodes you will maybe save some time on allocation and initialization. And you can also save a lot of space by reserving first 26 indices for leaf nodes. Thus you'll not need to allocate and initialize 180000 leaf nodes.
Also with indices you will be able to read the prepared nodes array from disk in binary format. This has to be several times faster. But I'm not sure how to do this on your language. Is this Java?
If you checked that your source vocabulary is sorted, you may also save some time by comparing some prefix of the current string with the previous one. E.g. first 4 characters. If they are equal you can start your
for(int level=0 ; level < key.length() ; level++) {
loop from the 5-th level.
Is it space inefficient or time inefficient? If you are rolling a plain trie then space may be part of the problem when dealing with a mobil device. Check out patricia/radix tries, especially if you are using it as a prefix look-up tool.
Trie:
http://en.wikipedia.org/wiki/Trie
Patricia/Radix trie:
http://en.wikipedia.org/wiki/Radix_tree
You didn't mention a language but here are two implementations of prefix tries in Java.
Regular trie:
http://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/Trie.java
Patricia/Radix (space-effecient) trie:
http://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/PatriciaTrie.java
Generally speaking, avoid using a lot of object creations from scratch in Java, which is both slow and it also has a massive overhead. Better implement your own pooling class for memory management that allocates e.g. half a million entries at a time in one go.
Also, serialization is too slow for large lexicons. Use a binary read to populate array-based representations proposed above quickly.