Binary Search Tree with 2 keys - algorithm

I have a database of users with their usernames and id's. These are the operations that program will process:
insert, delete (by username), search (by username), print (prints all users info, sorted by their id)
time complexity of first 3 operations shouldn't be more than O(log n) and for print it should be O(n). solution should be implemented with a balanced BST.
My idea to solve the problem is to have to 2 BST, key of one is id and for another is username. So we can access an element by their name or id both in O(log n) time. But this doubles memory space and time of operations.
Is there a way to access elements both by their username and id in O(log n) time in a better way than what i explained?

My idea to solve the problem is to have to 2 BST, key of one is id and
for another is username. So we can access an element by their username or
id both in O(log n) time. But this doubles memory space and time of
operations.
What you propose will indeed double the memory and time requirements for your data structure. (Only insertions and deletions will take double time. The other operations will take no extra time). However, recall that O(2 log n) is generally treated the same as O(log n) and is much less than O(n). As an illustration, I've graphed 2 log n and n. Note that they are equal when n is 2 or 4. log n is essentially a flat line compared to n.
I propose that you cannot do better than this using balanced BSTs (or at all, for that matter). Since you need to search based on username in O(log n) time, username must be the key for the tree. However, you also need to retrieve the users sorted by id in O(n) time. That essentially forbids you from sorting them after retrieving them, because you won't be able to sort them faster than O(n log n). Thus, they must already be sorted by id. Therefore, id must be a key for the tree. Hence, you need two trees.

While 2 trees are fine, you can also use a hash table for lookup and delete plus a sorted index for printing. A red-black tree will be fine for the sorted index.
However, if IDs are consecutive non-negative integers, it will be even more efficient to maintain a simple array, where position i contains the object with the ID of i. Now you can print by just traversing the array. And the hash table values can be IDs, for these "point" to the respective object in the array.

Related

Data structures - O(1) complexity

So I've been given the next question:
Describe a data structure by the following interface:
The structure will contain n elements, where each element holds a key and a value (meaning, each element is (key, value)).
insert ((key, value)): insert an element in O(1) average case and O(log n) worst case.
delete ((key, value)): delete the element that correspond to the given key in O(1) average case and O(log n) worst case.
find (key): find the element that correspond to the given key and return it's value in O(1) average case and O(log n) worst case.
setAll (m): change the value of each element in the structure to be m in O(1) worst case
So my main thought was to use a hash table to ensure O(1) average case runtime for insert, delete, find. the hash table will be implmented by chaining, but instead of linked list, use an AVL tree, so in the worst case, insert, delete and find will be O(log n).
But I got stuck on setAll. I can't deal with this problem in O(1) worst case runtime. I know that you can't really change all the values because it requires traversal on the elements, so I thought maybe I can use global variables and keep track of the calls for setAll but I can't really see how I implement such thing.
In addition, there is no limit on the space complxity, which is why I used a hash table containing AVL trees. This is also a clue that our lecturer gave us.
A hash table with AVL trees is a good start.
To implement setAll(m), keep an operation counter, and mark each entry with update_op = operation_count during insert.
When setAll(m) is called, set a field last_reset = operation_count and reset_value = m.
Then modify find so it returns reset_value for any entry with update_op < last_reset.

What is the run-time to insert a string into a hash table?

I believe inserting into a hash table is average-case O(1) and worst-case O(n). So if we loop through a string and add each word to a hash table (which maps the word to the number of times it occurs in the string), wouldn't that be worst-case O(n^2) run-time? I tried to ask this before, but the answers said it was worst-case O(n). Thanks!
You are right that under reasonable assumptions, a hash table will insert elements in O(1) average time and O(n) worse case time.
As for your problem, assuming you have n words in a string, you would have to iterate over each word and enter it into the hash table which would take O(n) average time or O(n^2) worst case time.
The worst case of insert depends on how the implementation of insert function handles collisions and resolution techniques. This will have a greater influence in both put() and get() operations. The collision resolution techniques are implemented differently in each libraries. The core idea is to maintain all colliding keys in the same bucket. And during retrieval traverse all the colliding keys and apply some equality check to retrieve the given key. Important thing to note is we need to maintain both 'keys' and 'values' in the bucket, to facilitate the above mentioned equality check.
Another thing to consider is, during insertion operation a hashcode will be generated for the given key. We can consider this to be constant O(1) for every key.
In worst case, all the keys could fall in the same bucket and hence O(n) for 1 get(). But for put() operation it is always constant O(1) irrespective of the collision.
Maintaining the list of colliding is a key factor. Some implementation​s are doing with BST rather than a linked list. Hence, worst case is O(log N) for insertion and retrieval.
At any cost, O(N log N) could be the runtime of inserting N elements not O(N^2).
Any decent implementation has to ensure the minimum colliding hash code of objects being generated, to have better performance.

Improved Insertion Method for Linear Probing

Currently, I am asked to devise an O(n log n) algorithm for inserting n elements into a hash table with n slots using linear probing.
Originally, it would take up to O(n^2) time for inserting n elements, if the hash values generated by the hash function end up being a single number.
Therefore, I am thinking about preventing the collisions before hashing any elements, by predicting them using a certain type of data structure.
For example: calculate all the hash values for each element (which is O(n)), search for possible collisions, change the hash values of the colliding keys, and then do insertion.
My question: is it possible to find a data structure that solves my problem in O(n log n) time?
Many thanks.
To start, initialize a vEB tree to contain 0..n-1, representing the open slots of the hash table. To insert an element into the hash table, call the FindNext method of the vEB tree once or twice (if there is wraparound) to determine the next free slot and then call Delete to remove it from the vEB tree. The total running time is O(n log log n).

Number of occurrences of words in a file - Complexity?

Given I have a file which a set of words:
1) If I choose a hash table to store word -> count, what would be the time complexity to find the occurrences of a particular word?
2) How could I return those words alphabetically ordered?
If I chose a hash table, I know that the time complexity for 1) would be O(n) to parse all the words and O(1) to get the count of a particular word.
I fail to see how could I order the hash table and what would be the time complexity. Any help?
A sortable hash map becomes, essentially, a binary tree. In java you can see TreeMap implementing the SortableMap interface with the O(log n) on look-up and insert.
If you want the best theoretical performance you'd use a HashMap with O(1) look-up and insert and then you'd use a bucket/radix sort with O(n) for display/iteration.
In reality using a radix sort on strings will perform worse than a quick sort O(n log n).
Your analysis of (1) is correct.
Most hash table implementations (that I know of) has no implicit ordering.
To get an ordered list you'd have to sort the list (O(n log n)), queries on the list would take O(log n).
You could theoretically define a hash operation and implementation that sorts, but making it well-distributed (for it to be efficient) would be difficult and just sorting would be a lot simpler.
If it's a file containing lots of duplicates, the best idea may be to use hashing first to eliminate duplicates, then iterate through the hash table to get a list of non-duplicates and sort that.
Working with hash tables has two drawbacks 1- They do not store data in sorted way, 2-Calculation of the hash value is usually time consuming. They also have linear complexity for insert/delete/lookup in the worst case.
My suggestion is using a Trie for storing your words. Which has a guaranteed O(1) (number of words) for insert/lookup. A pre-order traverse over a Trie will give a sorted list of the words in the Trie.

Suggestion for a data structure for ranking users based on scores

I'm looking to use create and maintain the ranks in memory.
Scores associated with user ids and ranks are computed based on the scores. The following functionality should be supported.
Sort ranks based on scores ascending or descending. Insertion or deletion in O (log n) time.
Given a user id, lookup user’s rank/score along with n preceding and succeeding ranks in O(log n) time. E.g. get ranks of user 8347 with 5 preceding and succeeding ranks.
Retrieve n number of ranks from any offset x. E.g. get 100 ranks starting from 800
Given these requirements, are there any suggestions on which data structure suits these requirements the best?
I think you can do this very efficiently using a combination of an order statistic tree and a hash table.
An order statistic tree is an augmented binary search tree that in addition to storing elements in sorted order, allows for lookup of elements by their index in the tree. That is, the structure supports O(lg n) insertion, deletion, and lookup of values by their key, as well as O(lg n) lookup of an element given its index. If you store the scores in this structure, you can easily insert or update new scores, as well as keeping track of the rank of each element in the tree.
In order to associate users with their scores, you could combine this structute with an auxiliary hash table mapping from user IDs to the nodes in the order statistic tree holding the score for that user. This would give you O(1) access to a player's score in addition to the O(lg n) lookup of a score's rank.
Hope this helps!
Use an in-memory SQLite database.
How about a priority-queue?
Each person will have a score, toss it into the priority-queue, and it'll do the work of sorting in ascending/descending priority/value/rank/etc.
binary search tree, one implementation - http://en.wikipedia.org/wiki/AA_tree
Space O(n) O(n)
Search O(log n) O(log n)
Insert O(log n) O(log n)
Delete O(log n) O(log n)

Resources