Data design Issue to find insertion deletion and getMin in O(1) - algorithm

As said in the title i need to define a datastructure that takes only O(1) time for insertion deletion and getMIn time.... NO SPACE CONSTRAINTS.....
I have searched SO for the same and all i have found is for insertion and deletion in O(1) time.... even a stack does. i saw previous post in stack overflow all they say is hashing...
with my analysis for getMIn in O(1) time we can use heap datastructure
for insertion and deletion in O(1) time we have stack...
so inorder to achieve my goal i think i need to tweak around heapdatastructure and stack...
How will i add hashing technique to this situation ...
if i use hashtable then what should my hash function look like how to analize the situation in terms of hashing... any good references will be appreciated ...

If you go with your initial assumption that insertion and deletion are O(1) complexity (if you only want to insert into the top and delete/pop from the top then a stack works fine) then in order to have getMin return the minimum value in constant time you would need to store the min somehow. If you just had a member variable keep track of the min then what would happen if it was deleted off the stack? You would need the next minimum, or the minimum relative to what's left in the stack. To do this you could have your elements in a stack contain what it believes to be the minimum. The stack is represented in code by a linked list, so the struct of a node in the linked list would look something like this:
struct Node
{
int value;
int min;
Node *next;
}
If you look at an example list: 7->3->1->5->2. Let's look at how this would be built. First you push in the value 2 (to an empty stack), this is the min because it's the first number, keep track of it and add it to the node when you construct it: {2, 2}. Then you push the 5 onto the stack, 5>2 so the min is the same push {5,2}, now you have {5,2}->{2,2}. Then you push 1 in, 1<2 so the new min is 1, push {1, 1}, now it's {1,1}->{5,2}->{2,2} etc. By the end you have:
{7,1}->{3,1}->{1,1}->{5,2}->{2,2}
In this implementation, if you popped off 7, 3, and 1 your new min would be 2 as it should be. And all of your operations is still in constant time because you just added a comparison and another value to the node. (You could use something like C++'s peek() or just use a pointer to the head of the list to take a look at the top of the stack and grab the min there, it'll give you the min of the stack in constant time).
A tradeoff in this implementation is that you'd have an extra integer in your nodes, and if you only have one or two mins in a very large list it is a waste of memory. If this is the case then you could keep track of the mins in a separate stack and just compare the value of the node that you're deleting to the top of this list and remove it from both lists if it matches. It's more things to keep track of so it really depends on the situation.
DISCLAIMER: This is my first post in this forum so I'm sorry if it's a bit convoluted or wordy. I'm also not saying that this is "one true answer" but it is the one that I think is the simplest and conforms to the requirements of the question. There are always tradeoffs and depending on the situation different approaches are required.

This is a design problem, which means they want to see how quickly you can augment existing data-structures.
start with what you know:
O(1) update, i.e. insertion/deletion, is screaming hashtable
O(1) getMin is screaming hashtable too, but this time ordered.
Here, I am presenting one way of doing it. You may find something else that you prefer.
create a HashMap, call it main, where to store all the elements
create a LinkedHashMap (java has one), call it mins where to track the minimum values.
the first time you insert an element into main, add it to mins as well.
for every subsequent insert, if the new value is less than what's at the head of your mins map, add it to the map with something equivalent to addToHead.
when you remove an element from main, also remove it from mins. 2*O(1) = O(1)
Notice that getMin is simply peeking without deleting. So just peek at the head of mins.
EDIT:
Amortized algorithm:
(thanks to #Andrew Tomazos - Fathomling, let's have some more fun!)
We all know that the cost of insertion into a hashtable is O(1). But in fact, if you have ever built a hash table you know that you must keep doubling the size of the table to avoid overflow. Each time you double the size of a table with n elements, you must re-insert the elements and then add the new element. By this analysis it would
seem that worst-case cost of adding an element to a hashtable is O(n). So why do we say it's O(1)? because not all the elements take worst-case! Indeed, only the elements where doubling occurs takes worst-case. Therefore, inserting n elements takes n+summation(2^i where i=0 to lg(n-1)) which gives n+2n = O(n) so that O(n)/n = O(1) !!!
Why not apply the same principle to the linkedHashMap? You have to reload all the elements anyway! So, each time you are doubling main, put all the elements in main in mins as well, and sort them in mins. Then for all other cases proceed as above (bullets steps).

A hashtable gives you insertion and deletion in O(1) (a stack does not because you can't have holes in a stack). But you can't have getMin in O(1) too, because ordering your elements can't be faster than O(n*Log(n)) (it is a theorem) which means O(Log(n)) for each element.
You can keep a pointer to the min to have getMin in O(1). This pointer can be updated easily for an insertion but not for the deletion of the min. But depending on how often you use deletion it can be a good idea.

You can use a trie. A trie has O(L) complexity for both insertion, deletion, and getmin, where L is the length of the string (or whatever) you're looking for. It is of constant complexity with respect to n (number of elements).
It requires a huge amount of memory, though. As they emphasized "no space constraints", they were probably thinking of a trie. :D

Strictly speaking your problem as stated is provably impossible, however consider the following:
Given a type T place an enumeration on all possible elements of that type such that value i is less than value j iff T(i) < T(j). (ie number all possible values of type T in order)
Create an array of that size.
Make the elements of the array:
struct PT
{
T t;
PT* next_higher;
PT* prev_lower;
}
Insert and delete elements into the array maintaining double linked list (in order of index, and hence sorted order) storage
This will give you constant getMin and delete.
For insertition you need to find the next element in the array in constant time, so I would use a type of radix search.
If the size of the array is 2^x then maintain x "skip" arrays where element j of array i points to the nearest element of the main array to index (j << i).
This will then always require a fixed x number of lookups to update and search so this will give constant time insertion.
This uses exponential space, but this is allowed by the requirements of the question.

in your problem statement " insertion and deletion in O(1) time we have stack..."
so I am assuming deletion = pop()
in that case, use another stack to track min
algo:
Stack 1 -- normal stack; Stack 2 -- min stack
Insertion
push to stack 1.
if stack 2 is empty or new item < stack2.peek(), push to stack 2 as well
objective: at any point of time stack2.peek() should give you min O(1)
Deletion
pop() from stack 1.
if popped element equals stack2.peek(), pop() from stack 2 as well

Related

Running maximum of changing array of fixed size

At first, I am given an array of fixed size, call it v. The typical size of v would be a few thousand entries. I start by computing the maximum of that array.
Following that, I am periodically given a new value for v[i] and need to recompute the value of the maximum.
What is a practically fast way (average time) of computing that maximum?
Edit: we can assume that the process is:
1) uniformly choosing a random entry;
2) changing its value to a uniform value between [0,1].
I believe this specifies the problem a bit better and allows an unequivocal "best answer" (which will depend on the array size).
You can maintain a max-heap of that array. The element can be index to the array. for every element of the array, you should also have some indexes to the element of max-heap. so every time v[i] is changed, you only need O(log(n)) to maintain the heap. (if v[i] is increased, it will go up in the heap, if v[i] is decreased, it will go down in the heap).
If the changes to the array are random, e.g. v[rand()%size] = rand(), then most of the time the max won't decrease.
There are two main ways I can think of to handle this: keep the full collection sorted on the fly, or track just the few (or one) highest elements. The choice depends on the relative importance of worst-case, average case, and fast-path. (Including code and data cache footprint of the common case where the change doesn't affect anything you're tracking.)
Really low complexity / overhead / code size: O(1) average case, O(N) worst case.
Just track the current max, (and optionally its position, if you can't get the old value to see if it == max before applying the change). On the rare occasion that the element holding the max decreased, rescan the whole array. Otherwise just see if the new element is greater than max.
The average complexity should be O(1) amortized: O(N) for N changes, since on average one of N changes affects the element holding the max. (And only half those changes decrease it).
A bit more overhead and code size, but less frequent scans of the full array: O(1) typical case, O(N) worst case.
Keep a priority queue of the 4 or 8 highest elements in the array (position and value). When an element in the PQueue is modified, remove it from the PQueue. Try to re-add the new value to the PQueue, but only if it won't be the smallest element. (It might be smaller than some other element we're not tracking). If the PQueue is empty, rescan the array to rebuild it to full size. The current max is the front of the PQueue. Rescanning the array should be quite rare, and in most cases we only have to touch about one cache line of data holding our PQueue.
Since the small PQueue needs to support fast access to the smallest and the largest element, and even finding elements that aren't the min or max, a sorted-array implementation probably makes the most sense, rather than a Heap. If it's only 8 elements, a linear search is probably best, too. (From the smallest element upwards, so the search ends right away if the old value of the element modified is less than the smallest value in the PQueue, the search stops right away.)
If you want to optimize the fast-path (position modified wasn't in the PQueue), you could store the PQueue as struct pqueue { unsigned pos[8]; int val[8]; }, and use vector instructions (e.g. x86 SSE/AVX2) to test i against all 8 positions in one or two tests. Hrm, actually just checking the old val to see if it's less than PQ.val[0] should be a good fast-path.
To track the current size of the PQueue, it's probably best to use a separate counter, rather than a sentinel value in pos[]. Checking for the sentinel every loop iteration is probably slower. (esp. since you'd prob. need to use pos to hold the sentinel values; maybe make it signed after all and use -1?) If there was a sentinel you could use in val[], that might be ok.
slower O(log N) average case, but no full-rescan worst case:
Xiaotian Pei's solution of making the whole array a heap. (This doesn't work if the ordering of v[] matters. You could keep all the elements in a Heap as well as in the ordered array, but that sounds cumbersome.) Re-heapifying after changing a random element will probably write several other cache lines every time, so the common case is much slower than for the methods that only track the top one or few elements.
something else clever I haven't thought of?

How do I further optimize this Data Structure?

I was recently asked to build a data structure that supports four operations, namely,
Push: Add an element to the DS.
Pop: Remove the last pushed element.
Find_max: Find the maximum element out of the currently stored elements.
Pop_max: Remove the maximum element from the DS.
The elements are integers.
Here is the solution I suggested:
Take a stack.
Store a pair of elements in it. The pair should be (element, max_so_far), where element is the element at that index and max_so_far is the maximum valued element seen so far.
While pushing an element into the stack, check the max_so_far of the topmost stack element. If current number is greater than that, put the current pair's max_so_far value as the current element's value, else store the previous max_so_far. This mean that pushing would simply be an O(1) operation.
For pop, simply pop an element out of the stack. Again, this operation is O(1).
For Find_max, return the value of the max_so_far of the topmost element in the stack. Again, O(1).
Popping the max element would involve going through the stack and explicitly removing the max element and pushing back the elements on top of it, after allotting new max_so_far values. This would be linear.
I was asked to improve it, but I couldn't.
In terms of time complexity, the overall time can be improved if all operations happen in O(logn), I guess. How to do that, is something I'm unable to get.
One approach would be to store pointers to the elements in a doubly-linked list, and also in a max-heap data structure (sorted by value).
Each element would store its position in the doubly-linked list and also in the max-heap.
In this case all of your operations would require O(1) time in the doubly-linked list, plus O(log(n)) time in the heap data structure.
One way to get O(log n)-time operations is to mash up two data structures, in this case a doubly linked list and a priority queue (a pairing heap is a good choice) . We have a node structure like
struct Node {
Node *previous, *next; // doubly linked list
Node **back, *child, *sibling; // pairing heap
int value;
} list_head, *heap_root;
Now, to push, we push in both structures. To find_max, we return the value of the root of the pairing heap. To pop or pop_max, we pop from the appropriate data structure and then use the other node pointers to delete in the other data structure.
Usually, when you need to find elements by quality A (value), and also by quality B (insert order), then you start eyeballing a data structure that actually has two data structures inside that reference each other, or are otherwise interleaved.
For instance: two maps that who's keys are quality A and quality B, who's values are a shared pointer to a struct that contains iterators back to both maps, and the value. Then you have log(n) to find an element via either quality, and erasure is ~O(logn) to remove the two iterators from either map.
struct hybrid {
struct value {
std::map<std::string, std::shared_ptr<value>>::iterator name_iter;
std::map<int, std::shared_ptr<value>>::iterator height_iter;
mountain value;
};
std::map<std::string, std::shared_ptr<value>> name_map;
std::map<int, std::shared_ptr<value>> height_map;
mountain& find_by_name(std::string s) {return name_map[s]->value;}
mountain& find_by_height(int height h) {return height_map[s]->value;}
void erase_by_name(std::string s) {
value& v = name_map[s];
name_map.erase(v.name_iter);
height_iter.erase(v.height_iter); //note that this invalidates the reference v
}
};
However, in your case, you can do even better than this O(logn), since you only need "the most recent" and "the next highest". To make "pop highest" fast, you need a fast way to detect the next highest, which means that needs to be precalculated at insert. To find the "height" position relative to the rest, you need a map of some sort. To make "pop most recent" fast, you need a fast way to detect the next most recent, but that's trivially calculated. I'd recommend creating a map or heap of nodes, where keys are the value for finding the max, and the values are a pointer to the next most recent value. This gives you O(logn) insert, O(1) find most recent, O(1) or O(logn) find maximum value (depending on implementation), and ~O(logn) erasure by either index.
One more way to do this is:-
Create max-heap with elements. In this way we will be able to get/delete max-element in O(1) operations.
Along with this we can maintain a pointer to last pushed element.And as far as I know delete in heaps can be constructed in O(log n).

Data structure that deletes all elements of a set less than or equal to x in O(1) time

I am self studying for an algorithms course, and I am trying to solve the following problem:
Describe a data structure to store a set of real numbers, which can perform each of the following operations in O(1) amortized time:
Insert(x) : Deletes all elements not greater than x, and adds x to the set.
FindMin() : Find minimum value of set.
I realize that findmin kind of becomes trivial once you have Insert, and see how with a linked list implementation, you could delete multiple elements simultaneously (ie O(1)), but finding out which link to delete (aka where x goes) seems like an O(n) or O(log n) operation, not O(1). The problem gave the hint: Consider using a stack, but I don't see how this is helpful.
Any help is appreciated.
The original question is below:
Note that your goal is to get O(1) amortized time, not O(1) time. This means that you can do as much work as you'd like per operation as long as n operations don't take more than O(n) time.
Here's a simple solution. Store the elements in a stack in ascending order. To insert an element, keep popping the stack until it's empty or until the top element is greater than x, then push x onto the stack. To do a find-min, read the top of the stack.
Find-min clearly runs in time O(1). Let now look at insert. Intuitively, each element is pushed and popped at most once, so we can spread the work of an expensive insert across cheaper inserts. More formally, let the potential be n, the number of elements on the stack. Each time you do an insert, you'll do some number of pops (say, k) and the potential increases by 1 - k (one new element added, k removed). The amortized cost is then k + 1 + 1 - k, which is 2. Therefore, insert is amortized O(1).
Hope this helps!
double is the data structure! In the methods below, ds represents the data structure that the operation is being performed on.
void Insert(ref double ds, double x)
{
ds = x;
}
double FindMin(double ds)
{
return ds;
}
The only way to ever observe the state of the data structure is to query its minimum element (FindMin). The only way to modify the state of the data structure is to set its new minimum element (Insert). So the data structure is simply the minimum element of the set.

Data Structure that supports queue like operations and mode finding

This was an interview question asked to me almost 3 years back and I was pondering about this a while back.
Design a data structure that supports the following operations:
insert_back(), remove_front() and find_mode(). Best complexity
required.
The best solution I could think of was O(logn) for insertion and deletion and O(1) for mode. This is how I solved it: Keep a queue DS for handling which element is inserted and deleted.
Also keep an array which is max heap ordered and a hash table.
The hashtable contains an integer key and an index into the heap array location of that element. The heap array contains an ordered pair (count,element) and is ordered on the count property.
Insertion : Insert the element into the queue. Find the location of the heap array index from the hashtable. If none exists, then add the element to the heap and heapify upwards. Then add the final location into the hashtable. Increment the count in that location and heapify upwards or downwards as needed to restore the heap property.
Deletion : Remove element from the head of the queue. From the hash table, find a location in the heap array index. Decrement the count in the heap and reheapify upward or downwards as needed to restore the heap property.
Find Mode: The element at the head of the array heap (getMax()) will give us the mode.
Can someone please suggest something better. The only optimization I could think of was using a Fibonacci heap but I am not sure if that is a good fit in this problem.
I think there is a solution with O(1) for all operations.
You need a deque, and two hashtables.
The first one is a linked hashtable, where for each element you store its count, the next element in count order and a previous element in count order. Then you can look the next and previous element's entries in that hashtable in a constant time. For this hashtable you also keep and update the element with the largest count. (element -> count, next_element, previous_element)
In the second hashtable for each distinct number of elements, you store the elements with that count in the start and in the end of the range in the first hashtable. Note that the size of this hashtable will be less than n (it's O(sqrt(n)), I think). (count -> (first_element, last_element))
Basically, when you add an element to or remove an element from the deque, you can find its new position in the first hashtable by analyzing its next and previous elements, and the values for the old and new count in the second hashtable in constant time. You can remove and add elements in the first hashtable in constant time, using algorithms for linked lists. You can also update the second hashtable and the element with the maximum count in constant time as well.
I'll try writing pseudocode if needed, but it seems to be quite complex with many special cases.

Maximizing minimum on an array

There is probably an efficient solution for this, but I'm not seeing it.
I'm not sure how to explain my problem but here goes...
Lets say we have one array with n integers, for example {3,2,0,5,0,4,1,9,7,3}.
What we want to do is to find the range of 5 consecutive elements with the "maximal minimum"...
The solution in this example, would be this part {3,2,0,5,0,4,1,9,7,3} with 1 as the maximal minimum.
It's easy to do with O(n^2), but there must be a better way of doing this. What is it?
If you mean literally five consecutive elements, then you just need to keep a sorted window of the source array.
Say you have:
{3,2,0,5,0,1,0,4,1,9,7,3}
First, you get five elements and sort'em:
{3,2,0,5,0, 1,0,1,9,7,3}
{0,0,2,3,5} - sorted.
Here the minimum is the first element of the sorted sequence.
Then you need do advance it one step to the right, you see the new element 1 and the old one 3, you need to find and replace 3 with 1 and then return the array to the sorted state. You actually don't need to run a sorting algorithm on it, but you can as there is just one element that is in the wrong place (1 in this example). But even bubble sort will do it in linear time here.
{3,2,0,5,0,1, 0,4,1,9,7,3}
{0,0,1,2,5}
Then the new minimum is again the first element.
Then again and again you advance and compare first elements of the sorted sequence to the minimum and remember it and the subsequence.
Time complexity is O(n).
Can't you use some circular buffer of 5 elements, run over the array and add the newest element to the buffer (thereby replacing the oldest element) and searching for the lowest number in the buffer? Keep a variable with the offset into the array that gave the highest minimum.
That would seem to be O(n * 5*log(5)) = O(n), I believe.
Edit: I see unkulunkulu proposed exactly the same as me in more detail :).
Using a balanced binary search tree indtead of a linear buffer, it is trivial to get complexity O(n log m).
You can do it in O(n) for arbitrary k-consecutive elements as well. Use a deque.
For each element x:
pop elements from the back of the deque that are larger than x
if the front of the deque is more than k positions old, discard it
push x at the end of the deque
at each step, the front of the deque will give you the minimum of your
current k-element window. Compare it with your global maximum and update if needed.
Since each element gets pushed and popped from the deque at most once, this is O(n).
The deque data structure can either be implemented with an array the size of your initial sequence, obtaining O(n) memory usage, or with a linked list that actually deletes the needed elements from memory, obtaining O(k) memory usage.

Resources