Structure with extremely fast insertion time - sorting

I'm looking for an ordered data structure which allows very fast insertion. That's the only property required. Data will only be accessed and deleted from the top element.
To be more precised, i need 2 structures :
1) The first structure should allow an ordered insertion using an int value. On completing the insertion, it shall report the rank of the inserted element.
2) The second structure should allow insertion at a specified rank.
The number of elements to be stored is likely to be in thousands, or tens of thousands.
[edit] i must amend the volume hypothesis : even though, at any moment, the size of the ordered structure is likely to be in the range of tens of thousands, the total number of insertion is likely to be in the tens of millions per run.
Insertion time in O(1) would be nice, although O(log(log(n))) is very acceptable too. Currently i've got some interesting candidate for First structure only, but either in log(n), or without the capability to report insertion rank (which is mandatory).

What about a form of skip-list, specifically the " indexed skiplist" in the linked article. That should give O(lg N) insert and lookup, and O(1) access to the first node for both your use cases.
--Edit--
When I think of O(1) algorithms, I think of radix-based methods. Here is an O(1) insert with rank returned. The idea is to break the key up into nibbles, and keep count of all the inserted items which have that prefix. Unfortunately, the the constant is high (<=64 dereferences and additions), and the storage is O(2 x 2^INT_BITS), which is awful. This is the version for 16 bit ints, expanding to 32 bits should be straightforward.
int *p1;int *p2;int *p3;int *p4;
void **records;
unsigned int min = 0xFFFF;
int init(void) {
p1 = (int*)calloc(16,sizeof(int));
p2 = (int*)calloc(256, sizeof(int));
p3 = (int*)calloc(4096, sizeof(int));
p4 = (int*)calloc(65536,sizeof(int));
records = (void**)calloc(65536,sizeof(void*));
return 0;
}
//records that we are storing one more item,
//counts the number of smaller existing items
int Add1ReturnRank(int* p, int offset, int a) {
int i, sum=0;
p+=offset;
for (i=0;i<a;i++)
sum += p[i];
p[i]++;
return sum;
}
int insert(int key, void* data) {
unsigned int i4 = (unsigned int)key;
unsigned int i3= (i4>> 4);
unsigned int i2= (i3>> 4);
unsigned int i1= (i2>> 4);
int rank = Add1ReturnRank(p1,0, i1&0xF);
rank += Add1ReturnRank(p2,i2&0xF0,i2&0xF);
rank += Add1ReturnRank(p3,i3&0xFF0,i3&0xF);
rank += Add1ReturnRank(p4,i4&0xFFF0,i4&0xF);
if (min>key) {min = key;}
store(&records[i4],data);
return rank;
}
This structure also supports O(1) GetMin and RemoveMin. (GetMin is instant, Remove has a constant similar to Insert.)
void* getMin(int* key) {
return data[*key=min];
}
void* removeMin(int* key) {
int next = 0;
void* data = records[min];
unsigned int i4 = min;
unsigned int i3= (i4>> 4);
unsigned int i2= (i3>> 4);
unsigned int i1= (i2>> 4);
p4[i4]--;
p3[i3]--;
p2[i2]--;
p1[i1]--;
*key = min;
while (!p1[i1]) {
if (i1==15) { min = 0xFFFF; return NULL;}
i2 = (++i1)<<4;
}
while (!p2[i2])
i3 = (++i2)<<4;
while (!p3[i3])
i4 = (++i3)<<4;
while (!p4[i4])
++i4;
min = i4;
return data;
}
If your data is sparse and well distributed, you could remove the p4 counter, and instead do an insertion sort into the P3 level. That would reduce storage costs by 16, at the cost of a higher worst case insert when there are many similar values.
Another idea to improve the storage would be to do combine this idea with something like an Extendable Hash. Use the integer key as the hash value, and keep count of the inserted nodes in the directory. Doing a sum over the relevant dictionary entries on an insert (as above) should still be O(1) with a large constant, but the storage would reduce to O(N)

Order Statistic Tree seems to fit your need at O(LogN) time. Link
An order-statistics tree is an augmented (see AugmentedDataStructures) version of a
BinarySearchTree that supports the additional operations Rank(x), which returns the rank
of x (i.e., the number of elements with keys less than or equal to x) and FindByRank(k),
which returns the k-th smallest element of the tree.
If you only have tens of thousands of elements, the performance difference between O(LogN) time and O(1) time asymptotic time complexity is not as significant as you thought. For example, consider 100000 elements, the logN method is only 16 times slower.
log(100 000) / log(2) = 16.6096405
In this case the difference in coefficient (implementation, overheads) may be the real target of optimization. Fancy data structures usually have a much higher overhead due to the inherit complexity (sometimes thousands of times slower). They are more likely to come from less refined implementation because they are less used.
You should benchmark(actually test) the different heap implementations to find one with the best real performance.

You say you need an ordered datastructure, which to me sounds like you need something that can yield all the elements contained in O(n) time.
But then you say you will only be accessing the top (least?) element, suggesting that you really just need something that can yield the minimum value, repeatedly - opening the door to something with a partial ordering.
Which is it?

If I understand your question correctly,I would recommend you to use use a Dictionary whose keys are ranks and values are linked list.
With keys you can have ranks and with linked list as the values, you can have O(1) insertion time. Also as removal, you can have O(1). You can implement a stack or queue with linkedlist, which is what you want.
Or you can just use a doubly linked list in which you are guaranteed to have O(1) insertion and removal. for ranking, you can embed that information within the nodes.

Related

"Stable" k-largest elements algorithm

Related: priority queue with limited space: looking for a good algorithm
I am looking for an algorithm which returns the k-largest elements from a list, but does not change the order of the k-largest elements, e.g. for k=4 and given 5,9,1,3,7,2,8,4,6, the algorithm should return 9,7,8,6.
More background, my input data are approximately 200 pairs (distance,importance) which are ordered w.r.t distance, and I need to select the 32 most important of them. Performance is crucial here, since I have to run this selection algorithm a few thousand times.
Up to now I have the two following ideas, but both of them seem not to be the best possible.
Remove the minimum element iteratively until 32 elements are left (i.e. do selection sort)
Use quickselect or median-of-medians to search for the 32nd largest element. Afterwards, sort the remaining 31 elements again w.r.t. distance.
I need to implement this in C++, so if anybody wants to write some code and does not know which language to use, C++ would be an option.
Inspired by #trincot's solution, I have come up with a slightly different variation with working implementation.
Algorithm
Use Floyd's algorithm to build the max heap or which is equivalent to the building priority_queue in C++ using the constructor in which we pass the entire array/vector at once, instead of adding elements individually. The max heap if built in O(N) time complexity.
Now, pop the items from max heap K-1 times until we get Kth Max Importance Item. Store the value of Kth Max Importance Item in variable Kth_Max_Importance_Item.
Scan all the nodes from original input whose importance value is greater than the importance value of Kth_Max_Importance_Item, and push them into output vector.
Calculate the left over count of required items with importance value equal to that of the importance value of Kth_Max_Importance_Item by subtracting the current size of output vector from k. Store it in variable left_Over_Count.
Scan left_Over_Count number of values of items from original input whose importance value if equal to importance value of Kth_Max_Importance_Item, and push them into output vector.
NOTE: If importance values are not unique, then this condition is taken care of by step 3 and 4 of the algorithm.
Time Complexity: O(N + K*log(N)). Assuming K<<N, then, Time Complexity ~ O(N).
Implementation:
#include <iostream>
#include <vector>
#include <queue>
#include <math.h>
typedef struct Item{
int distance;
double importance;
}Item;
struct itemsCompare{
bool operator() (const Item& item1, const Item& item2){
return ((item1.importance < item2.importance) ? true : false);
}
};
bool compareDouble(const double& a, const double& b){
return (fabs(a-b) < 0.000001) ? true : false;
}
int main(){
//Original input
std::vector<Item> items{{10, 2.1}, {9, 2.3}, {8, 2.2}, {7, 2.2}, {6, 1.5}};
int k = 4;
//Min Heap
std::priority_queue<Item, std::vector<Item>, itemsCompare> maxHeap (items.begin(), items.end());
//Checking if the order of original input is intact
/*for(int i=0;i<items.size();i++){
std::cout<<items[i].distance<<" "<<items[i].importance<<std::endl;
}*/
//Pulling the nodes until we get Kth Max Importance Node
int count = 0;
while(!maxHeap.empty()){
if(count == k-1){
break;
}
maxHeap.pop();
count++;
}
Item Kth_Max_Importance_Item = maxHeap.top();
//std::cout<<Kth_Max_Importance_Item.importance<<std::endl;
//Scanning all the nodes from original input whose importance value is greater than the importance value of Kth_Max_Importance_Item.
std::vector<Item> output;
for(int i=0;i<items.size();i++){
if(items[i].importance > Kth_Max_Importance_Item.importance){
output.push_back(items[i]);
}
}
int left_Over_Count = k - output.size();
//std::cout<<left_Over_Count<<std::endl;
//Adding left_Over_Count number of values of items whose importance value if equal to importance value of Kth_Max_Importance_Item
for(int i=0;i<items.size();i++){
if(compareDouble(items[i].importance, Kth_Max_Importance_Item.importance)){
output.push_back(items[i]);
left_Over_Count--;
}
if(!left_Over_Count){
break;
}
}
//Printing the output:
for(int i=0;i<output.size();i++){
std::cout<<output[i].distance<<" "<<output[i].importance<<std::endl;
}
return 0;
}
Output:
9 2.3
8 2.2
7 2.2
10 2.1
Use the heap-based algorithm for finding the k largest value, i.e. use a min heap (not a max heap) that never exceeds a size of k. Once it exceeds that size, keep pulling the root from it to restore it to a size of k.
At the end the heap's root will be k largest value. Let's call it m.
You could then scan the original input again to collect all values that are at least equal to m. This way you'll have them in their original order.
When that m is not unique, you could have collected too many values. So check the size of the result and determine how much longer it is than k. Go backwards through that list and mark the ones that have value m as deleted until you have reached the right size. Finally collect the non-deleted items.
All these scans are O(n). The most expensive step is the first one: O(nlogk).

Number of Binary Search Trees of a given Height

How can I find the number of BSTs upto a given height h and discard all the BSTs with height greater than h for a given set of unique numbers?
I have worked out the code using a recursive approach
static int bst(int h,int n){
if(h==0&&n==0)return 1;
else if(h==0&&n==1)return 1;
else if(h==0&&n>1)return 0;
else if(h>0&&n==0)return 1;
else{
int sum=0;
for(int i=1;i<=n;i++)
sum+=bst(h-1,i-1)*bst(h-1,n-i);
return sum;
}
}
You can speed it up by adding memoization as #DavidEisenstat suggested in the comments.
You create a memoization table to store the values of already computed results.
In the example, -1 indicates the value has not been computed yet.
Example in c++
long long memo[MAX_H][MAX_N];
long long bst(int h,int n){
if(memo[h][n] == -1){
memo[h][n] = //Compute the value here using recursion
}
return memo[h][n];
}
...
int main(){
memset(memo,-1,sizeof memo);
bst(102,89);
}
This will execute in O(h*n) as you will only compute bst once for each possible pair of n and h. Another advantage of this technique is that once the table is filled up, bst will respond in O(1) (for the values in the range of the table).
Be careful not to call the function with values above MAX_H and MAN_N. Also keep in mind memoization is a memory-time tradeoff, meaning your program will run faster, but it will use more memory too.
More info: https://en.wikipedia.org/wiki/Memoization

You will be given a stream of integers

You will be given a stream of integers, and a integer k for window size, you will only receive the streams integers one by one. whenever you receive an integer, you have to return the maximum number among last k integers inclusive of current entry.
Interviewer was expecting O(N) Time + O(1) avg case space complexity solution for N tasks and integers are not given in a array, every time only one integer will be passed as input to your method.
I tried solving it but couldn't come up with O(N) solution. Can anybody tell me how can we do it.
Assuming k is small and not part of the scaling parameter N (the question says N is the number of Tasks, but it's not quite clear what that means).
Implement a FIFO, with insertion and deletion costing O(1) in time, and O(k) in memory.
Also implement a max variable
Pop the oldest value, see if it is equal to max, if is (which unlikely for all values of max), then run over k elements of FIFO and recalculate max, otherwise don't. Amortized, this is O(1).
Compare new value with max and update max if necessary.
push max into FIFO.
Then time is O(1) and memory is O(k+1). I don't see how you not have storage requirements at least O(k). The time to process N integers is them O(N).
O(N) time is easy, but O(1) average space is impossible.
Here's what we absolutely need to store. For any number x we've seen in the last k inputs, if we've seen a bigger number since x, we can forget about x, since we'll never need to return it or compare it to anything again. If we haven't seen a bigger number since x, we need to store x, since we might have to return it at some point. Thus, we need to store the biggest number in the last k items, and the biggest after that, and the biggest after that, all the way up to the current input. In the worst case, the input is descending, and we always need to store all of the last k inputs. In the average case, at any time, we'll need to keep track of O(log(k)) items; however, the peak memory usage will be greater than this.
The algorithm we use is to simply keep track of a deque of all the numbers we just said we need to store, in their natural, descending order, along with when we saw them*. When we receive an input, we pop everything lower than it from the right of the deque and push the input on the right of the deque. We peek-left, and if we see that the item on the left is older than the window size, we pop-left. Finally, we peek-left, and the number we see is the sliding window maximum.
This algorithm processes each input in amortized constant time. The only part of processing an input that isn't constant time is the part where we pop-right potentially all of the deque, but since each input is only popped once over the course of the algorithm, that's still amortized constant. Thus, the algorithm takes O(N) time to process all input.
*If N is ridiculously huge, we can keep track of the indices at which we saw things mod k to avoid overflow problems.
First of all, most interviewers will consider memory O(k) to be O(1).
Given that little detail, you can just implement a ring buffer :
int numbers[k]; // O(1) because k is constant
int i = 0, next_number;
while (next_number= new_number()) {
numbers[i % k]= next_number;
i++;
if (i >= k) {
int max= MIN_INT;
for (int j= 0; j < k; j++) { // O(1) because k is constant
if (numbers[j] > max) max = numbers[j];
}
yield(max);
}
}
Of course, if they don't consider O(k) to be O(1), there is no solution to the problem and they either screwed up their question or were hoping for you to say that the question is wrong.
A fifo/deque is usually faster here (for k above some number), I'm just demonstrating the simplest answer to a dumb question.
This is the answer that comes to mind (C++)
#include <iostream>
#include <limits>
using namespace std;
int main()
{
cout << "Enter K: " << endl;
int K;
cin >> K;
cout << "Enter stream." << endl;
int n, counter = 1, max = std::numeric_limits<int>::min();
while (cin >> n){
if (counter % K == 0)
max = std::numeric_limits<int>::min();
if (n > max)
max = n;
cout << max << endl;
++counter;
}
}

which algorithm can do a stable in-place binary partition with only O(N) moves?

I'm trying to understand this paper: Stable minimum space partitioning
in linear time.
It seems that a critical part of the claim is that
Algorithm B sorts stably a bit-array of size n in
O(nlog2n) time and constant extra space, but makes only O(n) moves.
However, the paper doesn't describe the algorithm, but only references another paper which I don't have access to. I can find several ways to do the sort within the time bounds, but I'm having trouble finding one that guarantees O(N) moves without also requiring more than constant space.
What is this Algorithm B? In other words, given
boolean Predicate(Item* a); //returns result of testing *a for some condition
is there a function B(Item* a, size_t N); which stably sorts a using Predicate as the sort key with fewer than nlog2n calls to Predicate, and performs only O(N) writes to a?
I'm tempted to say that it isn't possible. Anytime you're computing O(n log n) amount of information but have (1) nowhere to stash it (constant space), and (2) nowhere to immediately use it (O(n) moves), there is something weird going on, possibly involving heavy use of the stack (which may not be included in the space analysis, although it should be).
It might be possible if you store temporary information inside many bits of just one integer, or something squirrelly like that. (So O(1) in practice, but O(log n) in theory.)
Radix sorting wouldn't quite do it because you'd have to call the predicate to create the digits, and if you don't memoize the transitivity of comparison then you'll call it O(n^2) times. (But to memoize it takes O(log n) amortized space per item, I think.)
QED - Proof by lack of imagination :)
Here's what I have so far. A version of cycle sort which uses a bit array to hold the result of the partition tests and calculates the destinations on the fly. It performs a stable binary partition with N compares, < N swaps, and exactly 2N bits of allocated storage.
int getDest(int i, BitArray p, int nz)
{ bool b=BitArrayGet(p,i);
int below = BitArrayCount1sBelow(p,i); //1s below
return (b)?(nz+below):i-below;
}
int BinaryCycleSort(Item* a, int n, BitArray p)
{
int i, numZeros = n-BitArrayCount1sBelow(p,n);
BitArray final = BitArrayNew(n);
for (i=0;i<numZeros;i++)
if (!BitArrayGet(final,i))
{ int dest= GetDest(i,p,numZeros);
while (dest!=i)
{ SwapItem(a+i,a+dest);
BitArraySet(final,dest);
dest = getDest(dest,p,numZeros);
}
BitArraySet(final,dest);
}
return numZeros;
}
int BinaryPartition(Item* a, int n, Predicate pPred)
{
int i;
BitArray p = BitArrayNew(n);
for (i=0;i<n;i++)
if (pPred(a+i)) BitArraySet(p,i);
return BinaryCycleSort(a,n,p);
}
using these helpers:
typedef uint32_t BitStore;
typedef BitStore* BitArray;
BitArray BitArrayNew(int N); //returns array of N bits, all cleared
void BitArraySet(BitArray ba, int i); //sets ba[i] to 1
bool BitArrayGet(BitArray ba, int i); //returns ba[i]
int BitArrayCount1sBelow(BitArray ba, int i) //counts 1s in ba[0..i)
Obviously this is not constant space. But I think this might be used as a building block to the ultimate goal. The whole array can be partitioned into N/B blocks using a fixed-size BitArray of B bits. Is there some way to re-use those same bits while performing a stable merge?
Isn't RadixSort ?
O(kN)

Computing the mode (most frequent element) of a set in linear time?

In the book "The Algorithm Design Manual" by Skiena, computing the mode (most frequent element) of a set, is said to have a Ω(n log n) lower bound (this puzzles me), but also (correctly i guess) that no faster worst-case algorithm exists for computing the mode. I'm only puzzled by the lower bound being Ω(n log n).
See the page of the book on Google Books
But surely this could in some cases be computed in linear time (best case), e.g. by Java code like below (finds the most frequent character in a string), the "trick" being to count occurences using a hashtable. This seems obvious.
So, what am I missing in my understanding of the problem?
EDIT: (Mystery solved) As StriplingWarrior points out, the lower bound holds if only comparisons are used, i.e. no indexing of memory, see also: http://en.wikipedia.org/wiki/Element_distinctness_problem
// Linear time
char computeMode(String input) {
// initialize currentMode to first char
char[] chars = input.toCharArray();
char currentMode = chars[0];
int currentModeCount = 0;
HashMap<Character, Integer> counts = new HashMap<Character, Integer>();
for(char character : chars) {
int count = putget(counts, character); // occurences so far
// test whether character should be the new currentMode
if(count > currentModeCount) {
currentMode = character;
currentModeCount = count; // also save the count
}
}
return currentMode;
}
// Constant time
int putget(HashMap<Character, Integer> map, char character) {
if(!map.containsKey(character)) {
// if character not seen before, initialize to zero
map.put(character, 0);
}
// increment
int newValue = map.get(character) + 1;
map.put(character, newValue);
return newValue;
}
The author seems to be basing his logic on the assumption that comparison is the only operation available to you. Using a Hash-based data structure sort of gets around this by reducing the likelihood of needing to do comparisons in most cases to the point where you can basically do this in constant time.
However, if the numbers were hand-picked to always produce hash collisions, you would end up effectively turning your hash set into a list, which would make your algorithm into O(n²). As the author points out, simply sorting the values into a list first provides the best guaranteed algorithm, even though in most cases a hash set would be preferable.
So, what am I missing in my understanding of the problem?
In many particular cases, an array or hash table suffices. In "the general case" it does not, because hash table access is not always constant time.
In order to guarantee constant time access, you must be able to guarantee that the number of keys that can possibly end up in each bin is bounded by some constant. For characters this is fairly easy, but if the set elements were, say, doubles or strings, it would not be (except in the purely academic sense that there are, e.g., a finite number of double values).
Hash table lookups are amortized constant time, i.e., in general, the overall cost of looking up n random keys is O(n). In the worst case, they can be linear. Therefore, while in general they could reduce the order of mode calculation to O(n), in the worst case it would increase the order of mode calculation to O(n^2).

Resources