About succinct data structure for dynamic key set - algorithm

Now I am looking for any implementation or research works for succinct data structure for dynamic key set. In specifically, I am trying to implement the compression algorithm for radix trie that contains character strings as keys. In order to ease update, I use the balanced parenthesis encoding scheme proposed by Jacobson to represent the trie structure. Further, I implement the bitvector data structure with supporting rank(), select(), and access() operations. With these primitive operations, I implement the balanced_parenthesis bitvector operations such as findopen() and findclose() which is the primitive operations of trie traversal such as child(), sibling(), etc.
But the problem I encounter is when the trie is updated.
Is there any implementation or research of bitvector data structure for dynamic set with the primitive operations?
Thanks in advance.

Related

What is the difference between Abstract Data Type and Logical Data Structure?

I am having a hard time differentiating between the Abstract Data Type (ADT) and Logical Data Structure (LDS). The only difference I can think of is that ADTs have defined operations, where as LDS are more about how the data is stored. But a Stack can be an ADT as we know what kind of operations must be denied to call something a 'stack' and also it can be called an LDT as we know how the data is should be 'structured' to call it a 'stack'. Both ADT and LDS are 'abstract' in that we don't talk about how they are implemented.
So is it correct that ADT and LDS are different names for the same thing, but depending on where we come from we can call it ADT or LDS?
An abstract data type (ADT) is a mathematical model along with some operations that are performed on that model. For example, the stack ADT consists of a sequence of integers (say) along with the operations push, pop, isempty, makeemptystack and top. An ADT is similar to an interface in Java, and are the specs. Data structures are about how these specs are implemented. For example, the stack ADT operations can be implemented using an array data structure, or using a linked list data structure. A queue ADT can be implemented using a circular array data structure in such a manner that all the ADT operations can be done in O(1) time.
In a real-world problem, you would encounter only a subset of the possible operations, which form your ADT, and you need to find a data structure that would implement exactly this subset of operations efficiently. For example, in some applications, you want to maintain a set of integers, and the only operations you would do on this set are to find the value of the smallest element in the set, delete the smallest element in the set, and insert a new element into the set. (Applications include Dijkstra's shortest path algorithm, Prim's minimum spanning tree algorithm, and Huffman coding.) A set, with these three operations MIN, EXTRACTMIN and INSERT, define the min-priority queue ADT. A data structure that can implement all these three operations effficiently - in O(log n) time - is a minheap. Other data structures - such as linked lists, unsorted arrays, sorted arrays - would take O(n) time for one or more of these operations, and hence are less efficient for this particular ADT.

Implementing Immutable, Growable Vectors

I'm interested in implementing persistent (e.g. purely functional, immutable, etc), growable vectors in F#, so that they might be used in the .NET framework. My current implementation is a variant on the Hash-Mapped Trie, and is done according to Clojure's implementation.
I'm having trouble implementing random-access insertions and deletions (inserting and removing elements at random indices) using this implementation. Is there some algorithm/modification that allows these operations efficiently, or some other implementation I can look at?
Clarification: When I say 'inserts' and 'deletes' I mean, for example, given the list [1; 2; 3; 4] an insert of 500 in position 1 will give me [1:500:2:3:4]. I don't mean a set or associate operation.
Finger trees might be what you are looking for. There is a Clojure implementation available.
Immutable vectors/lists typically provide fast updates by only allowing insertions at one end and then sharing the immutable data at the other end. If you want to do non-head/tail insertions what you're actually wanting to do is mutate the immutable end of your collection. You'll have to split the vector around the item you want to insert and then splice it back together to create a new vector, and the best you're going to be able to do it in is O(n) time.
Immutable sorted trees work a little bit differently, but they won't let you re-number indicies (keys) in less than O(n) time either.
Basically, if someone had discovered an efficient way to support random-access insertions in an immutable vector then it would be supported in one of the mainstream functional languages—but there is no such known data structure or algorithm, so there's no such implementation.
The only thing can do is split and join. This is very ineffective with clojure vectors. That is why Phill Bagwell implmented a persistent vector that can be split and join in log(n).
You might want to look at this video: http://blip.tv/clojure/phill-bagwell-striving-to-make-things-simple-and-fast-5936145
or directly to his paper here: infoscience.epfl.ch/record/169879/files/RMTrees.pdf
Port the Haskell HAMT library? The Insert operation is O(log n)

Trie based addressbook and efficient search by name and contact number

it is a known approach to develop an addressbook based on trie datastructure. It is an efficient data structure for strings. Suppose if we want to create an efficient search mechanism for an address book based on names, numbers etc, what is the efficient data structure to enable memory efficient and faster search based on any type of search terms irrespective of data type?
This is a strange question maybe you should add more informations but you can use a trie data structure not only for strings but also for many other data types. The definition of a trie is to make a dictionnary with an adjacent tree model. I know of a kart-trie that is something similar to a trie and uses a binary tree model. So it is the same data structure but with a different tree model. The kart-trie uses a clever key-alternating algorithm to hide a trie-data structure in a binary tree. It's not a patricia trie, or a radix-trie.
Good algorithm for managing configuration trees with wildcards?
http://code.dogmap.org/kart/
But I think a ternary tree would do the same trick:
http://en.wikipedia.org/wiki/Ternary_search_tree
http://igoro.com/archive/efficient-auto-complete-with-a-ternary-search-tree/

Useful data structure for the following case

Which can be the beste data structures for the following case.
1.Should have operations like search, insert and delete. Mostly searching activities will be there.Around 90% of the operations will be search and rest are delete and insert.
2 Insertion,deletion and searching will be based on the key of the objects. Each key will point to a object. The keys will be sorted.
Any suggestion for optimal data structure will be highly appreciated.
AVL tree, or at least BST.
If you want to acces often the same elements you might want to consider splay trees too.
(Should I explain why?)
Not sure by what you mean with "data structures"
I would suggest MySQL.
Read more here: WikiPedia
Self-balancing tree of sorts (AVL, RB), or a hash table.
My guess is that you want to optimize time. Overall, a red-black tree will have logarithmic-time performance in all three operations. It will probably be your best overall bet on execution time; however, red-black trees are complex to implement and require a node structure meaning they will be stored using more memory than the contained data itself requires.
You want a tree-backed Map; basically you just want a tree where the nodes are dynamically sorted ("self-balanced") by key, with your objects hanging off of each node with corresponding key.
If you would like an "optimal" data structure, that completely depends on the distribution of patterns of inputs you expect. The nice thing about a self-balancing tree is you don't really need to care too much about the pattern of inputs. If you really want the best-guess as-close-to-optimal as possible we know of, and you don't know much about the specific sequences of queries, you can use a http://en.wikipedia.org/wiki/Tango_tree which is O(log(log(N))-competitive. This grows so slowly that, for all practical purposes, you have something which performs no worse than effectively a constant factor from the best possible data structure you could have chosen.
However it's somewhat grungy to implement, you may just be better using a library for a self-balancing tree.
Python:
https://github.com/pgrafov/python-avl-tree/
Java:
If you're just Java, just use a TreeMap (red-black tree based) and ignore the implementation details. Most languages have similar data structures in their standard libraries.

Self-sorted data structure with random access

I need to implement self-sorted data structure with random access. Any ideas?
A self sorted data structure can be binary search trees. If you want a self sorted data structure and a self balanced one. AVL tree is the way to go. Retrieval time will be O(lgn) for random access.
Maintaining a sorted list and accessing it arbitrarily requires at least O(lgN) / operation. So, look for AVL, red-black trees, treaps or any other similar data structure and enrich them to support random indexing. I suggest treaps since they are the easiest to understand/implement.
One way to enrich the treap tree is to keep in each node the count of nodes in the subtree rooted at that node. You'll have to update the count when you modify the tree (eg: insertion/deletion).
I'm not too much involved lately with data structures implementation. Probably this answer is not an answer at all... you should see "Introduction to algorithms" written by Thomas Cormen. That book has many "recipes" with explanations about the inner workings of many data structures.
On the other hand you have to take into account how much time do you want to spend writing an algorithm, the size of the input and the if there is an actual necessity of an special kind of datastructure.
I see one thing missing from the answers here, the Skiplist
https://en.wikipedia.org/wiki/Skip_list
You get order automatically, there is a probabilistic element to search and creation.
Fits the question no worse than binary trees.
Self sorting is a little bit to ambigious. First of all
What kind of data structure?
There are a lot of different data structures out there, such as:
Linked list
Double linked list
Binary tree
Hash set / map
Stack
Heap
And many more and each of them behave differently than others and have their benefits of course.
Now, not all of them could or should be self-sorting, such as the Stack, it would be weird if that one were self-sorting.
However, the Linked List and the Binary Tree could be self sorting, and for this you could sort it in different ways and on different times.
For Linked Lists
I would preffere Insertion sort for this, you can read various good articles about this on both wikis and other places. I like the pasted link though. Look at it and try to understand the concept.
If you want to sort after it is inserted, i.e. on random times, well then you can just implement a sorting algororithm different than insertion sort maybe, bubblesort or maybe quicksort, I would avoid bubblesort though, it's a lot slower! But easier to gasp the mind around.
Random Access
Random is always something thats being discusses around so have a read about how to perform good randomization and you will be on your way, if you have a linked list and have a "getAt"-method, you could just randomize an index between 0 and n and get the item at that index.

Resources