Algorithms to find longest common prefix in a sliding window - algorithm

I have written a Lempel Ziv compressor and decompressor.
I am seeking to improve the time to search the dictionary for a phrase. I have considered K-M-P and Boyer-Moore, but I think an algorithm that adapts to changes in the dictionary would be faster.
I've been reading that binary search trees (AVL or with splays) improve the performance of compression time considerably. What I fail to understand is how to bootstrap the binary search tree and insert/remove data. I'm not actually quite sure the significance of each node in the binary search. I am searching for phrases so will each character be considered a node? Also how and what is inserted/removed from the search tree as new data enters the dictionary and old data is removed?
The binary search tree sounds like a good payoff since it can adapt to the dictionary, but I'm just not quite sure of how it's used.

Does this help?

Hmm. String B tree I think is what you want, but I haven't went over the LZ algorithm for ages.
http://portal.acm.org/citation.cfm?id=301973
http://portal.acm.org/citation.cfm?id=1142385

Related

given prefix, how to find most frequently words effiecitly

This is a interview question extended from this one: http://www.geeksforgeeks.org/find-the-k-most-frequent-words-from-a-file/
But this question required on more thing: GIVEN PREFIX
For example, given "Bl" return most frequently words such as "bloom, blame, bloomberg" etc.
SO using TRIE is a must. But then, how to efficitnly construct heap? It's not right or pratical to construct heap for each prefix at run time. what could be a good solution.
[suppose this TRIE or data structure is static, pre-build]
thanks!
Keep a trie for all the words appearing in the file and their counts. So now if it asked to return all the words with the prefix "BI", then you can do it efficiently using the trie formed. But since you are interested in giving the most frequently occuring words with the prefix "BI", you can now use a min-heap to give the answer efficiently just like it has been done in the post you have linked to.
Also note that since the space usage of a trie can grow very large, you can suggest the interviewer to use a Ternary search tree instead of a trie.

Useful data structure for the following case

Which can be the beste data structures for the following case.
1.Should have operations like search, insert and delete. Mostly searching activities will be there.Around 90% of the operations will be search and rest are delete and insert.
2 Insertion,deletion and searching will be based on the key of the objects. Each key will point to a object. The keys will be sorted.
Any suggestion for optimal data structure will be highly appreciated.
AVL tree, or at least BST.
If you want to acces often the same elements you might want to consider splay trees too.
(Should I explain why?)
Not sure by what you mean with "data structures"
I would suggest MySQL.
Read more here: WikiPedia
Self-balancing tree of sorts (AVL, RB), or a hash table.
My guess is that you want to optimize time. Overall, a red-black tree will have logarithmic-time performance in all three operations. It will probably be your best overall bet on execution time; however, red-black trees are complex to implement and require a node structure meaning they will be stored using more memory than the contained data itself requires.
You want a tree-backed Map; basically you just want a tree where the nodes are dynamically sorted ("self-balanced") by key, with your objects hanging off of each node with corresponding key.
If you would like an "optimal" data structure, that completely depends on the distribution of patterns of inputs you expect. The nice thing about a self-balancing tree is you don't really need to care too much about the pattern of inputs. If you really want the best-guess as-close-to-optimal as possible we know of, and you don't know much about the specific sequences of queries, you can use a http://en.wikipedia.org/wiki/Tango_tree which is O(log(log(N))-competitive. This grows so slowly that, for all practical purposes, you have something which performs no worse than effectively a constant factor from the best possible data structure you could have chosen.
However it's somewhat grungy to implement, you may just be better using a library for a self-balancing tree.
Python:
https://github.com/pgrafov/python-avl-tree/
Java:
If you're just Java, just use a TreeMap (red-black tree based) and ignore the implementation details. Most languages have similar data structures in their standard libraries.

real world examples for binary tree structure

Can any one tell me some real world examples of binary tree structure ?
Binary trees are used all over the place in the real world. Pretty much every major implementation of a sorted data-structure uses one (usually a balanced variant, like red-black).
In C++, map and set are built on it.
Represent a uni-dimensional space.
I don't know world examples other than this one.
But it's large used for logical purposes and indexers.
Databases indexes. When you index a field, it is put in a binary tree for fast retrieval.
General Searching/sorting. A binary search tree will let you sort and search for data quickly
I believe Huffman Algorithm also use Binary tree!!!

Balancing a ternary search tree

How does one go about 'balancing' a ternary search tree? Most tst implementations don't address balancing, but suggest inserting in an optimal order (which I can't control.)
The article in Dr. Dobbs about Ternary Search Trees says: D.D. Sleator and R.E. Tarjan describe theoretical balancing algorithms for ternary search trees in "Self-Adjusting Binary Search Trees" (Journal of the ACM, July 1985). You can find online versions of this paper with your favorite search engine.
One simple optimization is to make it a red-black tree, which can avoid some worst-case scenarios. TSTs are really just binary trees where the value of a given node is another TST. So, the "middle" child of a node is not really part of the tree that is being balanced at each level, as it cannot move to a different parent anyway.
This ensures that each tier of the trie is traversed in log(R) time, although you could probably do even better by taking into account the size of the subtries at each node. That looks to be a lot more complicated though!
A generalization of the binary search tree is the B-Tree, which works for fanouts anywhere from 2 and up. That's not the only way to do it, but it's a common one.
Roughly the way it works is if an insert or delete would put the tree out of balance, it steals an element or a space from a neighboring node. If even that isn't enough to keep the tree in balance, its height by will be changed to make room.
read this article:
"Self-Adjusting of Ternary Search Tries Using Conditional Rotations and Randomized Heuristics"
by
"Ghada Hany Badr∗ and B. John Oommen †"
it will help you to understanding self-adjusting and balancing TSTs.

Self-sorted data structure with random access

I need to implement self-sorted data structure with random access. Any ideas?
A self sorted data structure can be binary search trees. If you want a self sorted data structure and a self balanced one. AVL tree is the way to go. Retrieval time will be O(lgn) for random access.
Maintaining a sorted list and accessing it arbitrarily requires at least O(lgN) / operation. So, look for AVL, red-black trees, treaps or any other similar data structure and enrich them to support random indexing. I suggest treaps since they are the easiest to understand/implement.
One way to enrich the treap tree is to keep in each node the count of nodes in the subtree rooted at that node. You'll have to update the count when you modify the tree (eg: insertion/deletion).
I'm not too much involved lately with data structures implementation. Probably this answer is not an answer at all... you should see "Introduction to algorithms" written by Thomas Cormen. That book has many "recipes" with explanations about the inner workings of many data structures.
On the other hand you have to take into account how much time do you want to spend writing an algorithm, the size of the input and the if there is an actual necessity of an special kind of datastructure.
I see one thing missing from the answers here, the Skiplist
https://en.wikipedia.org/wiki/Skip_list
You get order automatically, there is a probabilistic element to search and creation.
Fits the question no worse than binary trees.
Self sorting is a little bit to ambigious. First of all
What kind of data structure?
There are a lot of different data structures out there, such as:
Linked list
Double linked list
Binary tree
Hash set / map
Stack
Heap
And many more and each of them behave differently than others and have their benefits of course.
Now, not all of them could or should be self-sorting, such as the Stack, it would be weird if that one were self-sorting.
However, the Linked List and the Binary Tree could be self sorting, and for this you could sort it in different ways and on different times.
For Linked Lists
I would preffere Insertion sort for this, you can read various good articles about this on both wikis and other places. I like the pasted link though. Look at it and try to understand the concept.
If you want to sort after it is inserted, i.e. on random times, well then you can just implement a sorting algororithm different than insertion sort maybe, bubblesort or maybe quicksort, I would avoid bubblesort though, it's a lot slower! But easier to gasp the mind around.
Random Access
Random is always something thats being discusses around so have a read about how to perform good randomization and you will be on your way, if you have a linked list and have a "getAt"-method, you could just randomize an index between 0 and n and get the item at that index.

Resources