Ordered list with O(1) random access and removal - algorithm

Does there exist a data structure with the following properties:
Elements are stored in some order
Accessing the element at a given index takes O(1) time (possibly amortized)
Removing an element takes amortized O(1) time, and changes the indices appropriately (so if element 0 is removed, the next access to element 0 should return the old element 1)
For context, I reduced an algorithm question from a programming competition to:
Over m queries, return the kth smallest positive number that hasn't been returned yet. You can assume the returned number is less than some constant n.
If the data structure above exists, then you can do this in O(m) time, by creating a list of numbers 1 to n. Then, for each query, find the element at index k and remove it. During the contest itself, my solution ended up being O(m^2) on certain inputs.
I'm pretty sure you can do this in O(m log m) with binary search trees, but I'm wondering if the ideal O(m) is reachable. Stuff I've found online tends to be close, but not quite there - the tricky part is that the elements you remove can be from anywhere in the list.

well the O(1) removal is possible with linked list
each element has pointer to next and previous element so removal just deletes element and sets the pointers of its neighbors like:
element[ix-1].next=element[ix+1].prev
accessing ordered elements at index in O(1) can be done with indexed arrays
so you have unordered array like dat[] and index array like idx[] the access of element ix is just:
dat[idx[ix]]
Now the problem is to have these properties at once
you can try to have linked list with index array but the removal needs to update index table which is O(N) in the worst case.
if you have just index array then the removal is also O(N)
if you have the index in some form of a tree structure then the removal can be close to O(log(N)) but the access will be also about O(log(N))

I believe there is a structure that would do both of this in O(n) time, where n was the number of points which had been removed, and not the total size. So if the number you're removing is small compared to the size of the array, it's close to O(1).
Basically, all the data is stored in an array. There is also a priority queue for deleted elements. Initialise like so:
Data = [0, 1, 2, ..., m]
removed = new list
Then, to remove an element, you add it's original index (see below for how to get this) to the priority queue (which is sorted by size of element with smallest at the front), and leave the array as is. So removing the 3rd element:
Data = [0, 1, 2, 3,..., m]
removed = 2
Then what's now the 4th and was the 5th:
Data = [0, 1, 2, 3,..., m]
removed = 2 -> 4
Then what's now the 3rd and was the 4th:
Data = [0, 1, 2, 3,..., m]
removed = 2 -> 3 -> 4
Now to access an element, you start with it's index. You then iterate along the removed list, increasing the index by one each time, until you reach an element which is larger than the increased value of the index. This will give you the original index(ie. position in Data) of the element you're looking for, and is the index you needed for removal.
This operation of iterating along the queue effectively increases the index by the number of elements before it that were removed.
Sorry if I haven't explained very well, it was clear in my head but hard to write down.
Comments:
Access is O(n), with n number of removed items
Removal is approximately twice the time of access, but still O(n)
A disadvantage is that memory use doesn't shrink with removal.
Could potentially 're-initialise' when removed list is large to reset memory use and access and removal times. This operation takes O(N), with N total array size.
So it's not quite what OP was looking for but in the right situation could be close.

Related

Data structure that supports random access by index and key, insertion, deletion in logaritmic time with order maintained

I'm looking for the data structure that stores an ordered list of E = (K, V) elements and supports the following operations in at most O(log(N)) time where N is the number of elements. Memory usage is not a problem.
E get(index) // get element by index
int find(K) // find the index of the element whose K matches
delete(index) // delete element at index, the following elements have their indexes decreased by 1
insert(index, E) // insert element at index, the following elements have their indexes increased by 1
I have considered the following incorrect solutions:
Use array: find, delete, and insert will still O(N)
Use array + map of K to index: delete and insert will still cost O(N) for shifting elements and updating map
Use linked list + map of K to element address: get and find will still cost O(N)
In my imagination, the last solution is the closest, but instead of linked list, a self-balancing tree where each node stores the number of elements on the left of it will make it possible for us to do get in O(log(N)).
However I'm not sure if I'm correct, so I want to ask whether my imagination is correct and whether there is a name for this kind of data structure so I can look for off-the-shelf solution.
The closest data structure i could think of is treaps.
Implicit treap is a simple modification of the regular treap which is a very powerful data structure. In fact, implicit treap can be considered as an array with the following procedures implemented (all in O(logN)O(log⁡N) in the online mode):
Inserting an element in the array in any location
Removal of an arbitrary element
Finding sum, minimum / maximum element etc. on an arbitrary interval
Addition, painting on an arbitrary interval
Reversing elements on an arbitrary interval
Using modification with implicit keys allows you to do all operation except the second one (find the index of the element whose K matches). I'll edit this answer if i come up with a better idea :)

Find element of an array that appears only once in O(logn) time

Given an array A with all elements appearing twice except one element which appears only once. How do we find the element which appears only once in O(logn) time? Let's discuss two cases.
Array is always sorted and elements are in sequential order. Let's assume A = [1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6], we want to find 3 in log n time because it appears only once.
When the array is not sorted and the elements are not in sequential order.
I can only come up with a solution of using the XOR operator on the binary representation of the integers as explained Here, and at the end, the binary string will represent the element which appears only once because duplicates will cancel out. But it takes O(n) time. How can we do better than that?
using Haroon S' comment this is the solution which I think is correct, given the constraints for time.
class Solution:
def singleNonDuplicate(self, nums: List[int]) -> int:
low = 0
high = len(nums)-1
while(low<high):
mid = (low+high)//2
if(mid%2==0):
mid+=1
if(nums[mid]==nums[mid+1]):
# answer in second half
high = mid-1
elif(nums[mid]==nums[mid-1]):
# answer in first half
low = mid+1
return nums[low]
If the elements are sorted (i.e., the first case you mentioned) then I believe a strategy not unlike binary search could work in O(logN) time.
Starting from the left endpoint in a sorted array, until we encounter the unique element, all the index pairs (2i, 2i + 1) we encounter along the way will have the same value. (i.e., due to the array being sorted) However, as we go towards the right endpoint of the array, as soon as we consider an array that includes the unique element, that structure of "same values within (2i, 2i+1) index pairs" will be invalid.
Using that information, a search algorithm similar to binary search can find out in which half of the array the unique element is. Basically, you can deduce that, "in the left half of the array, if the values in the rightmost index pair (2i, 2i+1) are the same, then the unique value is in the right half". (i.e., with the exception of the last index on the left half-array being even; but you can overcome that case with various O(1) time operations)
The overall complexity then becomes O(logN), due to the halving of the array size at each step.
For the demonstration of the index notion I mentioned above, see your own example. In the left of the unique element(i.e. 3) all index pairs (2i, 2i+1) have the same values. And all subarrays starting from index 0 and ending with an index that is to the right of the unique element, all index pairs (2i, 2i+1) have a correspond to cells that contain different values.
Unless the array is sorted, though, since you'd have to investigate each and every element, I believe any algorithm you may come up with would take at least O(n) time. This is what I think will happen in the second case you mention in your question.
In the general case this is impossible, as to make sure an element doesn't repeat you need to check every other element.
From your example, it seems the array might be a sorted sequence of integers with no "gaps" (or some other clearly defined sequence, like all even numbers, etc). In this case it is possible with a modified binary search.
You have the array [1,1,2,2,3,4,4,5,5,6,6].
You check the middle element and the element following it and see 3 and 4. Now you know there are only 5 elements from the set {1, 2, 3}, while there are 6 elements from the set {4, 5, 6}. Which means, the missing elements is in {1, 2, 3}.
Then you recurse on [1,1,2,2,3]. You see 2,2. Now you know there are 2 "1" elements and 1 "3" element, so 3 is the answer.
The reason you check 2 elements in each step is that if you see just "3", you don't know whether you hit the first 3 in "3,3" or the second one. But if you read 2 elements you always find a "boundary" between 2 different elements.
The condition for this to be viable is that, given the value of an element, you need to be able to calculate in O(1) how many different elements come before this element. In your case this is trivial, but it is also possible for any arithmetic series, geometric series (with fixed size numbers)...
This is not a O(log n) solution. I have no idea how to solve it in logarithmic time without the constraints that the array is sorted and we have a known difference between consecutive numbers so we can recognise when we are to the left or right of the singleton. The other solutions already deal with that special case and I couldn’t do better there either.
I have a suggestion that might solve the general case in O(n), rather than O(n log n) when you first sort the array. It’s not as fast as the xor solution, but it will also work for non-integers. The elements must have an order, so it is not completely general, but it will work anywhere you can sort the elements.
The idea is the same as the k’th order element algorithm based on Quicksort. You partition and recurse on one half of the array. The time recurrence is T(n) = T(n/2) + O(n) = O(n).
Given array x and indices i,j, representing sub-array x[i:j], partition with quicksort’s partitioning method. You want a variant that partitions x[i:j] into three segments, x[i:k] x[k:l], x[l:j] where all elements in the first part are smaller than the pivot (whatever it is) all elements in x[k:l] are equal to the pivot, and all elements in the last segment are greater than the pivot.
(you might be able to use a version that only partitions in two, or explicitly count the number of pivots, but with this version is easier to work with here)
Now, if the middle segment has length one, you have your singleton. It is the pivot.
If not, the length of the segment that has the singleton is odd while the other is even. So recurse on the segment with the odd length.
It doesn’t give you worst case linear time, for the same reason that Quicksort isn’t worst case log-linear, but you get an expected linear time algorithm and likely a fast one at that.
Not, of course, as fast as those solutions based on binary search, but here the elements do not need to be sorted and we can handle elements with arbitrary gaps between them. We are also not restricted to data where we can easily manipulate their bit-patterns. So it is more general. If you can compare the elements, this approach will find the singleton in O(n).
This solution will find the element in the array that appeared only once but there should not be more than one element of that type and the array should be sorted. This is Binary Search and will return the element in O(log n) time.
var singleNonDuplicate = function(nums) {
let s=0,e= nums.length-1
while(s < e){
let mid = Math.trunc(s+(e-s)/2)
if((mid%2 == 0&& nums[mid] ==nums[mid+1])||(mid%2==1 && nums[mid] == nums[mid-1]) ){
s= mid+1
}
else{
e = mid
}
}
return nums[s] // can return nums[e] also
};
I don't believe there is a O(log n) solution for that. The reason is that in order to find which element is appearing only once, you at least need to iterate over the elements of that array once.

Ruby- delete a value from sorted (unique) array at O(log n) runtime

I have a sorted array (unique values, not duplicated).
I know I can use Array#binarysearch but it's used to find values not delete them.
Can I delete a value at O(log n) as well? How?
Lets say I have this array:
arr = [-3, 4, 7, 12, 15, 20] #very long array
And I would like to delete the value 7.
So far I have this:
arr.delete(7) #I'm quite sure it's O(n)
Assuming Array#delete-at works at O(1).
I could do arr.delete_at(value_index)
Now I just need to get the value's index.
binary search can do it, since the array is already sorted.
But the only method utilizing the sorted attribute (that i know of) is binary search which returns values, nothing about deleting or returning indexes.
To sum it up:
1) How to delete a value from sorted not duplicated array at O(log n) ?
Or
2) Assuming array#delete-at works at O(1) (does it?), how can I get the value's index at O(log n)? ( I mean the array is already sorted, must I implement it myself?)
Thank you.
The standard Array implementation has no constraint on sorting or duplicate. Therefore, the default implementation has to trade performance with flexibility.
Array#delete deletes an element in O(n). Here's the C implementation. Notice the loop
for (i1 = i2 = 0; i1 < RARRAY_LEN(ary); i1++) {
...
}
The cost is justified by the fact Ruby has to scan all the items matching given value (note delete deletes all the entries matching a value, not just the first), then shift the next items to compact the array.
delete_at has the same cost. In fact, it deletes the element by given index, but then it uses memmove to shift the remaining entries one index less on the array.
Using a binary search will not change the cost. The search will cost you O(log n), but you will need to delete the element at given key. In the worst case, when the element is in position [0], the cost to shift all the other items in memory by 1 position will be O(n).
In all cases, the cost is O(n). This is not unexpected. The default array implementation in Ruby uses arrays. And that's because, as said before, there are no specific constraints that could be used to optimize operations. Easy iteration and manipulation of the collection is the priority.
Array, sorted array, list and sorted list: all these data structures are flexible, but you pay the cost in some specific operations.
Back to your question, if you care about performance and your array is sorted and unique, you can definitely take advantage of it. If your primary goal is finding and deleting items from your array, there are better data structures. For instance, you can create a custom class that stores your array internally using a d-heap where the delete() costs O(log[d,n]), same applies if you use a binomial heap.

Maintaining sort while changing random elements

I have come across this problem where I need to efficiently remove the smallest element in a list/array. That would be fairly trivial to solve - a heap would be sufficient.
However, the issue now is that when I remove the smallest element, it would cause changes in other elements in the data structure, which may result in the ordering being changed. An example is this:
I have an array of elements:
[1,3,5,7,9,11,12,15,20,33]
When I remove "1" from the array "5" and "12" get changed to "4" and "17" respectively.
[3,4,7,9,11,17,15,20,33]
And hence the ordering is not maintained.
However, the element that is removed will have pointers to all elements that will be changed, but there is not knowing how many elements will be changed and by how much.
So my question is:
What is the best way to store these elements to maximize performance when removing the smallest element from the data structure while maintaining sort? Or should I just leave it unsorted?
My current implementation is just storing them unsorted in a vector, so the time complexity is O(N^2), O(N) for finding the smallest element, and N removals.
A.
If you have the list M of all changed elements of the ordered list L,
go through M, and for every element
If it is still ordered with its neigbours in M, live it be.
If it is not in order with neighbours, exclude it from the M.
Such excluded elements will create a list N
Order N
Use some algorithm for merging ordered lists. http://en.wikipedia.org/wiki/Merge_algorithm
B.
If you are sure that new elements are few and not strongly changed, simply use the bubble sort.
I would still go with a heap ,backed by an array
In case only a few elements change after each pop,After you perform the pop operation , perform a heapify up/down for any item that reduces in value. It will still be in the order of O(nlog k) values, where k is the size of your array and n the number of elements that have reduced in size.
If a lot of items change in size , then you can consider this as a case where you have an unsorted array and you just create a heap from the array.

Data Structure for fast position lookup

Looking for a datastructure that logically represents a sequence of elements keyed by unique ids (for the purpose of simplicity let's consider them to be strings, or at least hashable objects). Each element can appear only once, there are no gaps, and the first position is 0.
The following operations should be supported (demonstrated with single-letter strings):
insert(id, position) - add the element keyed by id into the sequence at offset position. Naturally, the position of each element later in the sequence is now incremented by one. Example: [S E L F].insert(H, 1) -> [S H E L F]
remove(position) - remove the element at offset position. Decrements the position of each element later in the sequence by one. Example: [S H E L F].remove(2) -> [S H L F]
lookup(id) - find the position of element keyed by id. [S H L F].lookup(H) -> 1
The naïve implementation would be either a linked list or an array. Both would give O(n) lookup, remove, and insert.
In practice, lookup is likely to be used the most, with insert and remove happening frequently enough that it would be nice not to be linear (which a simple combination of hashmap + array/list would get you).
In a perfect world it would be O(1) lookup, O(log n) insert/remove, but I actually suspect that wouldn't work from a purely information-theoretic perspective (though I haven't tried it), so O(log n) lookup would still be nice.
A combination of trie and hash map allows O(log n) lookup/insert/remove.
Each node of trie contains id as well as counter of valid elements, rooted by this node and up to two child pointers. A bit string, determined by left (0) or right (1) turns while traversing the trie from its root to given node, is part of the value, stored in the hash map for corresponding id.
Remove operation marks trie node as invalid and updates all counters of valid elements on the path from deleted node to the root. Also it deletes corresponding hash map entry.
Insert operation should use the position parameter and counters of valid elements in each trie node to search for new node's predecessor and successor nodes. If in-order traversal from predecessor to successor contains any deleted nodes, choose one with lowest rank and reuse it. Otherwise choose either predecessor or successor, and add a new child node to it (right child for predecessor or left one for successor). Then update all counters of valid elements on the path from this node to the root and add corresponding hash map entry.
Lookup operation gets a bit string from the hash map and uses it to go from trie root to corresponding node while summing all the counters of valid elements to the left of this path.
All this allow O(log n) expected time for each operation if the sequence of inserts/removes is random enough. If not, the worst case complexity of each operation is O(n). To get it back to O(log n) amortized complexity, watch for sparsity and balancing factors of the tree and if there are too many deleted nodes, re-create a new perfectly balanced and dense tree; if the tree is too imbalanced, rebuild the most imbalanced subtree.
Instead of hash map it is possible to use some binary search tree or any dictionary data structure. Instead of bit string, used to identify path in the trie, hash map may store pointer to corresponding node in trie.
Other alternative to using trie in this data structure is Indexable skiplist.
O(log N) time for each operation is acceptable, but not perfect. It is possible, as explained by Kevin, to use an algorithm with O(1) lookup complexity in exchange for larger complexity of other operations: O(sqrt(N)). But this can be improved.
If you choose some number of memory accesses (M) for each lookup operation, other operations may be done in O(M*N1/M) time. The idea of such algorithm is presented in this answer to related question. Trie structure, described there, allows easily converting the position to the array index and back. Each non-empty element of this array contains id and each element of hash map maps this id back to the array index.
To make it possible to insert element to this data structure, each block of contiguous array elements should be interleaved with some empty space. When one of the blocks exhausts all available empty space, we should rebuild the smallest group of blocks, related to some element of the trie, that has more than 50% empty space. When total number of empty space is less than 50% or more than 75%, we should rebuild the whole structure.
This rebalancing scheme gives O(MN1/M) amortized complexity only for random and evenly distributed insertions/removals. Worst case complexity (for example, if we always insert at leftmost position) is much larger for M > 2. To guarantee O(MN1/M) worst case we need to reserve more memory and to change rebalancing scheme so that it maintains invariant like this: keep empty space reserved for whole structure at least 50%, keep empty space reserved for all data related to the top trie nodes at least 75%, for next level trie nodes - 87.5%, etc.
With M=2, we have O(1) time for lookup and O(sqrt(N)) time for other operations.
With M=log(N), we have O(log(N)) time for every operation.
But in practice small values of M (like 2 .. 5) are preferable. This may be treated as O(1) lookup time and allows this structure (while performing typical insert/remove operation) to work with up to 5 relatively small contiguous blocks of memory in a cache-friendly way with good vectorization possibilities. Also this limits memory requirements if we require good worst case complexity.
You can achieve everything in O(sqrt(n)) time, but I'll warn you that it's going to take some work.
Start by having a look at a blog post I wrote on ThriftyList. ThriftyList is my implementation of the data structure described in Resizable Arrays in Optimal Time and Space along with some customizations to maintain O(sqrt(n)) circular sublists, each of size O(sqrt(n)). With circular sublists, one can achieve O(sqrt(n)) time insertion/removal by the standard insert/remove-then-shift in the containing sublist followed by a series of push/pop operations across the circular sublists themselves.
Now, to get the index at which a query value falls, you'll need to maintain a map from value to sublist/absolute-index. That is to say, a given value maps to the sublist containing the value, plus the absolute index at which the value falls (the index at which the item would fall were the list non-circular). From these data, you can compute the relative index of the value by taking the offset from the head of the circular sublist and summing with the number of elements which fall behind the containing sublist. To maintain this map requires O(sqrt(n)) operations per insert/delete.
Sounds roughly like Clojure's persistent vectors - they provide O(log32 n) cost for lookup and update. For smallish values of n O(log32 n) is as good as constant....
Basically they are array mapped tries.
Not quite sure on the time complexity for remove and insert - but I'm pretty sure that you could get a variant of this data structure with O(log n) removes and inserts as well.
See this presentation/video: http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
Source code (Java): https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java

Resources