Find out top 10 companies with highest volume trades - algorithm

I have been working on a problem from glass door that was being asked in one of the firm interviews by the firm that I ought to go to. The problem goes as :
If you have all the companies that are traded, and live inputs are coming of which company is being traded and what is the volume, how do you maintain the data, so that you can carry out operation of giving the top 10 most traded companies by volume of shares most efficiently
I thought of following solution for the same. Though I am not sure whether it is the efficient one or not: what about you maintain a binary search tree. With every insert you insert the company name and the volume of shares traded for it.
My basic node for the tree would then be:
class Node
{
String key; // company name
int volume; // volume
Node leftNode;
Node rightNode;
}
So at every new insert I will keep on inserting in the tree. And at the time of final retrieval , I can run the following code until the count of global count reaches 10.
traversal(Node a)
{
if(a!=null)
{
traverse(a.getRightNode());
System.out.println(a.getKey()+a.getValue());
traverse(a.getLeftNode());
}
}
What are your views on this solution?

This question is very similar to another question but with little twist. First of all, if somebody ask me this question I would ask a lot of questions. Do I know name of companies in advance? What is number of companies? Is there upper bound of their number? Do you mean time efficiency or memory consumption efficiency or mix of both? What is ratio of trades and top companies requests? It is not specified but I will assume high amount of trades and displaying Top 10 on demand or in some time interval. In case of requesting Top 10 after every trade arrival heap will be useless even for bigger N than 10 and whole algorithm would can be simpler. I also assume time efficiency. Memory efficiency is then constrained by CPU cache behaviour so we should not waste it anyway.
So we will store top N in some structure which will give me least member fast. This is for big N obviously heap. I can use any heap implementation even those which does have bad IncKey and Merge operations or doesn't have them at all. I will need only Insert, Peek and Remove operations. Number 10 is pretty small one and I would not even need heap for this especially in compiled languages with good compiler. I can use ordered array or list or even unordered one. So in every place where I will mention heap bellow, you can use ordered or unordered array or list. Heap is necessary only for bigger N in Top N.
So this is it, we will store Top N companies name and it's volume when inserted in heap.
Then we need track company trade volume in some K/V storage. Key is name. K/V storage for this purpose can be hashmap, trie or Judy. It will be good if we know company names in advance. It will allow us to compute perfect hash for hashmap or construct optimal trie. Otherwise it will be nice if we know upper bound company number otherwise to choose good hash length and number of buckets. Otherwise we will have to make resizable hash or use Judy. There is not know trie implementation for dynamic K/V better than hashmap or Judy. All of this K/V storages has O(k) access complexity, where k is length of Key which is name in this case. In every place, where I will mention hashmap bellow, you can use Judy or trie. You can use trie only when all of company names are known in advance and you can tailor super fast optimized code.
So we sill store company name as Key and trade volume so far and flag indicating storing in heap in hashmap.
So there is algorithm here. We will have state which contain heap, number of companies in heap and hashmap. For each arrived company mane and volume we will increase volume in hashmap. Then if companies in heap is less than N (10) we will add this company name and volume from hashmap to the heap if is not there yet (according to flag and set this flag in hashmap). Otherwise if heap is full and current company is not in heap, we will peek into heap and if current company has less volume traded so far (in hashmap) than company in heap we can finish this trade and go for next. Otherwise we have to update companies in heap first. While company in top of heap (it means with least volume) has volume in heap less than in current one and also different than in hashmap, we will update this volume. It can be done by removing from heap, and insert right value. Then check again top of heap and so on. Note, that we don't need update all companies in heap and even not all top heap companies which are not up to date. It's pretty lazy. If current company has still bigger volume than in top of heap, we will remove company from heap and insert current one and update flags in hashmap. That`s all.
Brief recapitulation:
min-heap storing top N companies ordered by volume and containing company name or direct index into hashmap
volume in heap can be out of date
hashmap with company name as key and up-to-date volume and flag indicating heap member as value
first update current company volume in hashmap and remember
repeatedly update heap top if less than current traded company
remove heap top if still less than current company and add current one in heap
This algorithm gain advantage that trade volume can be only positive number so volume in heap can be only less than right value and if top of heap has least value from all of heap and still less than right value and still bigger than any company in hasmap everything is perfect. Otherwise we would have to store all companies in heap, use max heap instead min heap, implement IncKey and perform this operation for all trades and keep back-references to heap in hashmap and everything is far more complicated.
Processing of new trade time complexity is nice O(1). O(1) is hashmap lookup, O(1) is Peek in heap. Insert and Delete in heap are amortized O(1) or O(logN) where N is constant so still O(1). Number of updates in heap is O(N) so O(1). You can also compute upper bound of processing time when there is upper bound of companies number (hashmap size problem mentioned at the beginning) so with good implementation you can consider it real time. Keep in mind that simpler solution (as ordered or unordered list, updating all Top members and so) can bring better performance in compiled code for small N as 10 especially on modern HW.
This algorithm can be nicely implemented even in functional language except there is not pure functional hash table but trie should have O(1) behavior or there will be some impure module for this. For example Erlang implementation using ordered list as heap and dictionary for hashmap. (Mine favorite functional heap is pairing heap but for 10 it is overkill.)
-module(top10trade).
-record(top10, {
n = 0,
heap = [],
map = dict:new()
}).
-define(N, 10).
-export([new/0, trade/2, top/1, apply_list/2]).
new() ->
#top10{}.
trade({Name, Volume}, #top10{n = N, map = Map} = State)
% heap is not full
when N < ?N ->
case dict:find(Name, Map) of
% it's already in heap so update hashmap only
{ok, {V, true}} ->
State#top10{map = dict:store(Name, {V+Volume, true}, Map)};
% otherwise insert to heap
error ->
State#top10{
n = N+1,
heap = insert({Volume, Name}, State#top10.heap),
map = dict:store(Name, {Volume, true}, Map)
}
end;
% heap is full
trade({Name, Volume}, #top10{n = ?N, map = Map} = State) ->
% look-up in hashmap
{NewVolume, InHeap} = NewVal = case dict:find(Name, Map) of
{ok, {V, In}} -> {V+Volume, In};
error -> {Volume, false}
end,
if InHeap ->
State#top10{map = dict:store(Name, NewVal, Map)};
true -> % current company is not in heap so peek in heap and try update
update(NewVolume, Name, peek(State#top10.heap), State)
end.
update(Volume, Name, {TopVal, _}, #top10{map = Map} = State)
% Current Volume is smaller than heap Top so store only in hashmap
when Volume < TopVal ->
State#top10{map = dict:store(Name, {Volume, flase}, Map)};
update(Volume, Name, {TopVal, TopName}, #top10{heap = Heap, map = Map} = State) ->
case dict:fetch(TopName, Map) of
% heap top is up-to-date and still less than current
{TopVal, true} ->
State#top10{
% store current to heap
heap = insert({Volume, Name}, delete(Heap)),
map = dict:store( % update current and former heap top records in hashmap
Name, {Volume, true},
dict:store(TopName, {TopVal, false}, Map)
)
};
% heap needs update
{NewVal, true} ->
NewHeap = insert({NewVal, TopName}, delete(Heap)),
update(Volume, Name, peek(NewHeap), State#top10{heap = NewHeap})
end.
top(#top10{heap = Heap, map = Map}) ->
% fetch up-to-date volumes from hashmap
% (in impure language updating heap would be nice)
[ {Name, element(1, dict:fetch(Name, Map))}
|| {_, Name} <- lists:reverse(to_list(Heap)) ].
apply_list(L, State) ->
lists:foldl(fun apply/2, State, L).
apply(top, State) ->
io:format("Top 10: ~p~n", [top(State)]),
State;
apply({_, _} = T, State) ->
trade(T, State).
%%%% Heap as ordered list
insert(X, []) -> [X];
insert(X, [H|_] = L) when X < H -> [X|L];
insert(X, [H|T]) -> [H|insert(X, T)].
-compile({inline, [delete/1, peek/1, to_list/1]}).
delete(L) -> tl(L).
peek(L) -> hd(L).
to_list(L) -> L.
It performs nice 600k trades per second. I would expect few millions per second in C implementation depending of number of companies. More companies means slower K/V look-up and update.

You can do it using min binary heap data structure where you maintain a heap of size 10 and delete the top element every time you have a new company which has greater volume than top and insert new company into heap. All the element currently in the heap are current top 10 companies.
Note: Add all the first 10 companies at the start.

Well, there are trade-offs here. You are going to need to choose what you prefer - an efficient look-up (get top K) or an efficient insertion. As it seems, you cannot get both.
You can get O(logN) insertion and lookup by using two Data-structures:
Map<String,Node> - that maps from the company name a node in the second data structure. This will be a trie or a self balancing tree.
Map<Integer,String> - that maps from volumes to the company's name. This can be a map (hash/tree based) or it can also be a heap, since we have a link to the direct node, we can actually delete a node efficiently when needed.
Getting the top 10 can be done on the 2nd data structure in O(logN), and inserting each element requires looking by string - O(|S| * logN) (you can use a trie to get O(|S|) here) - and than modifying the second tree - which is O(logN)
Using a trie totals in O(|S|+logN) complexity for both get top K and insertions.
If number of data inserted is exponential in the number of getTopK() ops - it will be better to just keep a HashMap<String,Integer> and modify it as new data arrives, and when you get a findTopK() - do it in O(N) as described in this thread - using selection algorithm or a heap.
This results in O(|S|) insertion (on average) and O(N + |S|) get top K.
|S| is the length of the input/result string where it appears.
This answer assumes each company can appear more than once in the
input stream.

Related

Data structures to manage available resources

What is the best data structure for this case? Given N resources with ID from 0 to N-1, you can get a resource or free a resource.
We also need to consider the time & space complexity for get and free operations.
interface ResourcePool {
int get(); // return an available ID
void free(int id); // mark ID as available
}
Follow up: what if N is a super large number, say 1 billion or 1 trillion.
Generally, you need 2 things:
A variable like int nextUnused that contains the smallest ID that's never been allocated
A list of free IDs less than nextUnused.
Allocating an ID will take it from the free list if it's non empty. Otherwise it will increment nextUnused.
Freeing an ID will just add it to the free list.
There are lots of different representations for the free list, but if you need to reserve memory for allocated resources, then it's common to reuse the memory of the free ones as linked list nodes in the free list, so the free list itself doesn't consume any space. This kind of data structure is called... a "free list": https://en.wikipedia.org/wiki/Free_list
Alternatively, you can store the free list separately. Since IDs can be freed in any order, and you need to remember which ones are free, there is no choice but to store the whole list somehow.
If your ID space is really big, it's conceivable that you could adopt strategies for keeping this representation as small as possible, but I've never seen much effort put into that in practice. The other possibility is to move parts of the free list into disk storage when it gets too big.
If N is very large, you can represent your resource pool using a balanced binary search tree. Each node in the tree is a range of free ids, represented by an upper and lower bound of ints. get() removes an arbitrary node from the tree, increments the lower bound, then re-inserts the node if the range it represents is still non-empty. free(i) inserts a new node (i,i), then coalesces that nodes with its two neighbors, if possible. For instance, if the tree contains (7,9) and (11,17), then free(10) results in a tree with fewer nodes - (7,9), (10, 10), and (11,17) are all removed, and (7,17) is there in their place. On the other hand, if the two neighbors of (10,10) are (7,9) and (12,17), then the result is (7,10) and (12,17), while if the two neighbors are (7,8) and (12,17), then no coalescing is possible and all three nodes, (7,8), (10,10), and (12,17), remain in the tree.
Both operations, get() and free(), take O(log P) time, where P is the size of the number of reserved elements at the moment the operation begins. This is slower than a free list, but the advantage of this over a plain free list is that the size of the structure will be no more than P, so as long as P is much smaller than N, the space usage is low.

Algorithm for top K stock in electronic exchange

You work in an electronic exchange. Throughout the day, you receive ticks (trading data) which consists of product name and its traded volume of stocks. Eg: {name: vodafone, volume: 20}
What data structure will you maintain if:
You have to tell top k products traded by volume at end of day.
You have to tell top k products traded by volume throughout the day.
What's the most efficient solution that you can think of?
The most efficient solution I could think of was to use a heap and map for both situations
heap to store stock by decreasing volume (updating - O(logn)and getTop k - O(k))
map to track stock volume (updating - O(1))
What you're looking for is a kind of map or dictionary which supports the following queries:
Add(key, x): add x to the total for that key, creating a new entry if it doesn't already exist.
GetKLargest(k): return the keys/totals for the k largest entries.
Let's say Q is the number of queries, and n is the number of distinct keys. We should assume that Q is much larger than n; choosing the NYSE as an example, there are a few thousand stocks traded, and a few million trades per day.
In the first scenario we assume that there are a large number of Add queries followed by one GetKLargest query. Since the cost of the Add query dominates, we can use a hashtable so that Add takes O(1) time, and then at the end of the day we can do GetKLargest in O(n log k) time using a priority queue of size k; note that we don't need to sort the whole key-set in O(n log n) time just to find the k largest elements. The total cost of answering Q queries is O(Q + n log k).
In the second scenario, we assume there could be a large number of both kinds of query. The cost of either query could dominate. A good option is to use an order statistic tree, which supports Add in O(log n) time, and GetKLargest in O(k log n) time. To look up a company by name in the tree requires a separate index, which can be maintained as a hashtable. The total cost is O(Qk log n) in the worst case.
If k is fixed or has a fixed limit, we can do better: keep the totals in a hashtable, but also maintain a priority queue of the current top k elements alongside. The cost of the Add query is now O(log k) because of maintaining the priority queue; to do this efficiently we need the map to also store the current index of each company in the priority queue, if it's there, otherwise searching the priority queue for the right company is O(k). The cost of GetKLargest is O(k) since we just output the contents of the priority queue. (The problem doesn't say we need to output them in order. If we do, then we can use a sorted array instead of a heap for the priority queue, and Add takes O(k) time.)
In this case, the total cost of answering Q queries is O(Qk). Note that this only works if we know in advance the maximum value of k that could be queried, before the query arrives; otherwise we don't know how big to make the priority queue.

Deleting a node from the middle of a heap

Deleting a node from the middle of the heap can be done in O(lg n) provided we can find the element in the heap in constant time. Suppose the node of a heap contains id as its field. Now if we provide the id, how can we delete the node in O(lg n) time ?
One solution can be that we can have a address of a location in each node, where we maintain the index of the node in the heap. This array would be ordered by node ids. This requires additional array to be maintained though. Is there any other good method to achieve the same.
PS: I came across this problem while implementing Djikstra's Shortest Path algorithm.
The index (id, node) can be maintained separately in a hashtable which has O(1) lookup complexity (on average). The overall complexity then remains O(log n).
Each data structure is designed with certain operations in mind. From wikipedia about heap operations
The operations commonly performed with a heap are:
create-heap: create an empty heap
find-max or find-min: find the maximum item of a max-heap or a minimum item of a min-heap, respectively
delete-max or delete-min: removing the root node of a max- or min-heap, respectively
increase-key or decrease-key: updating a key within a max- or min-heap, respectively
insert: adding a new key to the heap
merge joining two heaps to form a valid new heap containing all the elements of both.
This means, heap is not the best data structure for the operation you are looking for. I would advice you to look for a better suited data structure(depending on your requirements)..
I've had a similar problem and here's what I've come up with:
Solution 1: if your calls to delete some random item will have a pointer to item, you can store your individual data items outside of the heap; have the heap be of pointers to these items; and have each item contain its current heap array index.
Example: the heap contains pointers to items with keys [2 10 5 11 12 6]. The item holding value 10 has a field called ArrayIndex = 1 (counting from 0). So if I have a pointer to item 10 and want to delete it, I just look at its ArrayIndex and use that in the heap for a normal delete. O(1) to find heap location, then usual O(log n) to delete it via recursive heapify.
Solution 2: If you only have the key field of the item you want to delete, not its address, try this. Switch to a red-black tree, putting your payload data in the actual tree nodes. This is also O( log n ) for insert and delete. It can additionally find an item with a given key in O( log n ), which makes delete-by-key continue to be log n.
Between these, solution 1 will require an overhead of constantly updating ArrayIndex fields with every swap. It also results in a kind of strange one-off data structure that the next code maintainer would need to study and understand. I think solution 2 would be about as fast, and has the advantage that it's a well-understood algo.

Data Structure for fast position lookup

Looking for a datastructure that logically represents a sequence of elements keyed by unique ids (for the purpose of simplicity let's consider them to be strings, or at least hashable objects). Each element can appear only once, there are no gaps, and the first position is 0.
The following operations should be supported (demonstrated with single-letter strings):
insert(id, position) - add the element keyed by id into the sequence at offset position. Naturally, the position of each element later in the sequence is now incremented by one. Example: [S E L F].insert(H, 1) -> [S H E L F]
remove(position) - remove the element at offset position. Decrements the position of each element later in the sequence by one. Example: [S H E L F].remove(2) -> [S H L F]
lookup(id) - find the position of element keyed by id. [S H L F].lookup(H) -> 1
The naïve implementation would be either a linked list or an array. Both would give O(n) lookup, remove, and insert.
In practice, lookup is likely to be used the most, with insert and remove happening frequently enough that it would be nice not to be linear (which a simple combination of hashmap + array/list would get you).
In a perfect world it would be O(1) lookup, O(log n) insert/remove, but I actually suspect that wouldn't work from a purely information-theoretic perspective (though I haven't tried it), so O(log n) lookup would still be nice.
A combination of trie and hash map allows O(log n) lookup/insert/remove.
Each node of trie contains id as well as counter of valid elements, rooted by this node and up to two child pointers. A bit string, determined by left (0) or right (1) turns while traversing the trie from its root to given node, is part of the value, stored in the hash map for corresponding id.
Remove operation marks trie node as invalid and updates all counters of valid elements on the path from deleted node to the root. Also it deletes corresponding hash map entry.
Insert operation should use the position parameter and counters of valid elements in each trie node to search for new node's predecessor and successor nodes. If in-order traversal from predecessor to successor contains any deleted nodes, choose one with lowest rank and reuse it. Otherwise choose either predecessor or successor, and add a new child node to it (right child for predecessor or left one for successor). Then update all counters of valid elements on the path from this node to the root and add corresponding hash map entry.
Lookup operation gets a bit string from the hash map and uses it to go from trie root to corresponding node while summing all the counters of valid elements to the left of this path.
All this allow O(log n) expected time for each operation if the sequence of inserts/removes is random enough. If not, the worst case complexity of each operation is O(n). To get it back to O(log n) amortized complexity, watch for sparsity and balancing factors of the tree and if there are too many deleted nodes, re-create a new perfectly balanced and dense tree; if the tree is too imbalanced, rebuild the most imbalanced subtree.
Instead of hash map it is possible to use some binary search tree or any dictionary data structure. Instead of bit string, used to identify path in the trie, hash map may store pointer to corresponding node in trie.
Other alternative to using trie in this data structure is Indexable skiplist.
O(log N) time for each operation is acceptable, but not perfect. It is possible, as explained by Kevin, to use an algorithm with O(1) lookup complexity in exchange for larger complexity of other operations: O(sqrt(N)). But this can be improved.
If you choose some number of memory accesses (M) for each lookup operation, other operations may be done in O(M*N1/M) time. The idea of such algorithm is presented in this answer to related question. Trie structure, described there, allows easily converting the position to the array index and back. Each non-empty element of this array contains id and each element of hash map maps this id back to the array index.
To make it possible to insert element to this data structure, each block of contiguous array elements should be interleaved with some empty space. When one of the blocks exhausts all available empty space, we should rebuild the smallest group of blocks, related to some element of the trie, that has more than 50% empty space. When total number of empty space is less than 50% or more than 75%, we should rebuild the whole structure.
This rebalancing scheme gives O(MN1/M) amortized complexity only for random and evenly distributed insertions/removals. Worst case complexity (for example, if we always insert at leftmost position) is much larger for M > 2. To guarantee O(MN1/M) worst case we need to reserve more memory and to change rebalancing scheme so that it maintains invariant like this: keep empty space reserved for whole structure at least 50%, keep empty space reserved for all data related to the top trie nodes at least 75%, for next level trie nodes - 87.5%, etc.
With M=2, we have O(1) time for lookup and O(sqrt(N)) time for other operations.
With M=log(N), we have O(log(N)) time for every operation.
But in practice small values of M (like 2 .. 5) are preferable. This may be treated as O(1) lookup time and allows this structure (while performing typical insert/remove operation) to work with up to 5 relatively small contiguous blocks of memory in a cache-friendly way with good vectorization possibilities. Also this limits memory requirements if we require good worst case complexity.
You can achieve everything in O(sqrt(n)) time, but I'll warn you that it's going to take some work.
Start by having a look at a blog post I wrote on ThriftyList. ThriftyList is my implementation of the data structure described in Resizable Arrays in Optimal Time and Space along with some customizations to maintain O(sqrt(n)) circular sublists, each of size O(sqrt(n)). With circular sublists, one can achieve O(sqrt(n)) time insertion/removal by the standard insert/remove-then-shift in the containing sublist followed by a series of push/pop operations across the circular sublists themselves.
Now, to get the index at which a query value falls, you'll need to maintain a map from value to sublist/absolute-index. That is to say, a given value maps to the sublist containing the value, plus the absolute index at which the value falls (the index at which the item would fall were the list non-circular). From these data, you can compute the relative index of the value by taking the offset from the head of the circular sublist and summing with the number of elements which fall behind the containing sublist. To maintain this map requires O(sqrt(n)) operations per insert/delete.
Sounds roughly like Clojure's persistent vectors - they provide O(log32 n) cost for lookup and update. For smallish values of n O(log32 n) is as good as constant....
Basically they are array mapped tries.
Not quite sure on the time complexity for remove and insert - but I'm pretty sure that you could get a variant of this data structure with O(log n) removes and inserts as well.
See this presentation/video: http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
Source code (Java): https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java

Data design Issue to find insertion deletion and getMin in O(1)

As said in the title i need to define a datastructure that takes only O(1) time for insertion deletion and getMIn time.... NO SPACE CONSTRAINTS.....
I have searched SO for the same and all i have found is for insertion and deletion in O(1) time.... even a stack does. i saw previous post in stack overflow all they say is hashing...
with my analysis for getMIn in O(1) time we can use heap datastructure
for insertion and deletion in O(1) time we have stack...
so inorder to achieve my goal i think i need to tweak around heapdatastructure and stack...
How will i add hashing technique to this situation ...
if i use hashtable then what should my hash function look like how to analize the situation in terms of hashing... any good references will be appreciated ...
If you go with your initial assumption that insertion and deletion are O(1) complexity (if you only want to insert into the top and delete/pop from the top then a stack works fine) then in order to have getMin return the minimum value in constant time you would need to store the min somehow. If you just had a member variable keep track of the min then what would happen if it was deleted off the stack? You would need the next minimum, or the minimum relative to what's left in the stack. To do this you could have your elements in a stack contain what it believes to be the minimum. The stack is represented in code by a linked list, so the struct of a node in the linked list would look something like this:
struct Node
{
int value;
int min;
Node *next;
}
If you look at an example list: 7->3->1->5->2. Let's look at how this would be built. First you push in the value 2 (to an empty stack), this is the min because it's the first number, keep track of it and add it to the node when you construct it: {2, 2}. Then you push the 5 onto the stack, 5>2 so the min is the same push {5,2}, now you have {5,2}->{2,2}. Then you push 1 in, 1<2 so the new min is 1, push {1, 1}, now it's {1,1}->{5,2}->{2,2} etc. By the end you have:
{7,1}->{3,1}->{1,1}->{5,2}->{2,2}
In this implementation, if you popped off 7, 3, and 1 your new min would be 2 as it should be. And all of your operations is still in constant time because you just added a comparison and another value to the node. (You could use something like C++'s peek() or just use a pointer to the head of the list to take a look at the top of the stack and grab the min there, it'll give you the min of the stack in constant time).
A tradeoff in this implementation is that you'd have an extra integer in your nodes, and if you only have one or two mins in a very large list it is a waste of memory. If this is the case then you could keep track of the mins in a separate stack and just compare the value of the node that you're deleting to the top of this list and remove it from both lists if it matches. It's more things to keep track of so it really depends on the situation.
DISCLAIMER: This is my first post in this forum so I'm sorry if it's a bit convoluted or wordy. I'm also not saying that this is "one true answer" but it is the one that I think is the simplest and conforms to the requirements of the question. There are always tradeoffs and depending on the situation different approaches are required.
This is a design problem, which means they want to see how quickly you can augment existing data-structures.
start with what you know:
O(1) update, i.e. insertion/deletion, is screaming hashtable
O(1) getMin is screaming hashtable too, but this time ordered.
Here, I am presenting one way of doing it. You may find something else that you prefer.
create a HashMap, call it main, where to store all the elements
create a LinkedHashMap (java has one), call it mins where to track the minimum values.
the first time you insert an element into main, add it to mins as well.
for every subsequent insert, if the new value is less than what's at the head of your mins map, add it to the map with something equivalent to addToHead.
when you remove an element from main, also remove it from mins. 2*O(1) = O(1)
Notice that getMin is simply peeking without deleting. So just peek at the head of mins.
EDIT:
Amortized algorithm:
(thanks to #Andrew Tomazos - Fathomling, let's have some more fun!)
We all know that the cost of insertion into a hashtable is O(1). But in fact, if you have ever built a hash table you know that you must keep doubling the size of the table to avoid overflow. Each time you double the size of a table with n elements, you must re-insert the elements and then add the new element. By this analysis it would
seem that worst-case cost of adding an element to a hashtable is O(n). So why do we say it's O(1)? because not all the elements take worst-case! Indeed, only the elements where doubling occurs takes worst-case. Therefore, inserting n elements takes n+summation(2^i where i=0 to lg(n-1)) which gives n+2n = O(n) so that O(n)/n = O(1) !!!
Why not apply the same principle to the linkedHashMap? You have to reload all the elements anyway! So, each time you are doubling main, put all the elements in main in mins as well, and sort them in mins. Then for all other cases proceed as above (bullets steps).
A hashtable gives you insertion and deletion in O(1) (a stack does not because you can't have holes in a stack). But you can't have getMin in O(1) too, because ordering your elements can't be faster than O(n*Log(n)) (it is a theorem) which means O(Log(n)) for each element.
You can keep a pointer to the min to have getMin in O(1). This pointer can be updated easily for an insertion but not for the deletion of the min. But depending on how often you use deletion it can be a good idea.
You can use a trie. A trie has O(L) complexity for both insertion, deletion, and getmin, where L is the length of the string (or whatever) you're looking for. It is of constant complexity with respect to n (number of elements).
It requires a huge amount of memory, though. As they emphasized "no space constraints", they were probably thinking of a trie. :D
Strictly speaking your problem as stated is provably impossible, however consider the following:
Given a type T place an enumeration on all possible elements of that type such that value i is less than value j iff T(i) < T(j). (ie number all possible values of type T in order)
Create an array of that size.
Make the elements of the array:
struct PT
{
T t;
PT* next_higher;
PT* prev_lower;
}
Insert and delete elements into the array maintaining double linked list (in order of index, and hence sorted order) storage
This will give you constant getMin and delete.
For insertition you need to find the next element in the array in constant time, so I would use a type of radix search.
If the size of the array is 2^x then maintain x "skip" arrays where element j of array i points to the nearest element of the main array to index (j << i).
This will then always require a fixed x number of lookups to update and search so this will give constant time insertion.
This uses exponential space, but this is allowed by the requirements of the question.
in your problem statement " insertion and deletion in O(1) time we have stack..."
so I am assuming deletion = pop()
in that case, use another stack to track min
algo:
Stack 1 -- normal stack; Stack 2 -- min stack
Insertion
push to stack 1.
if stack 2 is empty or new item < stack2.peek(), push to stack 2 as well
objective: at any point of time stack2.peek() should give you min O(1)
Deletion
pop() from stack 1.
if popped element equals stack2.peek(), pop() from stack 2 as well

Resources