What data structure should I use for these operations? - algorithm

I need a data structure that that stores a subset—call it S—of {1, . . . , n} (n given initially)
and supports just these operations:
• Initially: n is given, S = {1, . . . , n} at the beginning.
• delete(i): Delete i from S. If i isn't in S already, no effect.
• pred(i): Return the predecessor in S of i. This means max{j ∈ S | j < i}, the greatest element in S
that is strictly less than i. If there is none, return 0. The parameter i is guaranteed to be in {1, . . . , n},
but may or may not be in S.
For example, if n = 7 and S = {1, 3, 6, 7}, then pred(1) returns 0, pred(2) and pred(3) return 1.
I need to figure out:
a data structure that represents S
an algorithm for initialization (O(n) time)
an algorithm for delete (O(α(n)) amortized time)
an algorithm for pred (O(α(n)) amortized time)
Would appreciate any help (I don't need code - just the algorithms).

You can use Disjoint-set data structure.
Let's represent our subset as disjoint-set. Each element of the disjoint-set is an element of the subset i (including always present zero) unioned with all absent elements in the set that is greater than i and less than next set element.
Example:
n = 10
s = [1, 4, 7, 8], disjoint-set = [{0}, {1,2,3}, {4,5,6}, {7}, {8, 9, 10}]
s = [3, 5, 6, 10], disjoint-set = [{0, 1, 2}, {3, 4}, {5}, {6, 7, 8, 9}, {10}]
Initially, we have a full set that is represented by n+1 disjoint-set elements (with zero included). Usually, every disjoint-set element is a rooted tree, and we can store the leftmost number in the element for every tree root.
Let's leftmost(i) is a leftmost value of a disjoint-set element that contains i.
leftmost(i) operation is similar to Find operation of a disjoint-set. We just go from i to the root of the element and return the leftmost number stored for the root. Complexity: O(α(n))
We can check if i is in the subset comparing i with leftmost(i). If they are equal (and i > 0) then i is in the subset.
pred(i) will be equal to leftmost(i) if i is not in the subset, and equal to leftmost(i-1) if i is in the subset. Complexity: O(α(n))
On every delete(i) operation we check if i is in the subset at first. If i is in the subset we should union an element containing i with the left neighbor element (this is the element that contains i-1). This operation is similar to Union operation of a disjoint-set. The leftmost number of resulting tree will be equal to leftmost(i-1). Complexity: O(α(n))
Edit: I've just noticed "strictly less than i" in the question, changed description a bit.

I'm not sure if there is a data structure that can guarantee all these properties in O(α(n)) time, but a good start would be predecessor data structures like van Emde Boas trees or y-fast tries
The vEB tree works is defined recursively based on the binary representation of the element indices. Let's assume that n=2^b for some b=2^k
If we have only two elements, store the minimum and maximum
Otherwise, we divide the binary representation of all the elements into the upper and lower b/2 bits.
We build a vEB tree ('summary') for the upper bits of all elements and √n vBE trees for the lower bits (one for every choice of the upper bits). Additionally, we store the minimum and maximum element.
This gives you O(n) space usage and O(log log n) = O(k) time for search, insertion and deletion.
Note however that the constant factors involved might be very large. If your n is representable in 32bit, at least I know of this report by Dementiev et al. breaking the recursion when the problem sizes are solvable more easily with other techniques
The idea of y-fast tries builds on x-fast tries:
They are most simply described as a trie based on the binary representation of its elements, combined with a hash table for every level and some additional pointers.
y-fast tries reduce the space usage by splitting the elements in nearly equally-sized partitions and choosing representatives (maximum) from them, over which an x-fast trie is built. Searches within the partitions are then realized using normal balanced search trees.
The space usage and time complexity are comparable to the vEBs. I'm guessing the constant factors will be a bit smaller than a naïve implementation of vEBs, but the claim is only based on intuition.
A last note: Always keep in mind that log log n < 6, which will probably not change in the near future

In terms of providing with an O(α(n)) time, it really becomes tricky. Here is my idea of approaching this:
Since we know the range of i, which is from 1 to n, we can first form a self balancing BST like AVL tree. The nodes of this AVL tree shall be the objects of DataNode. Here is how it might look like:
public class DataNode{
int value;
boolean type;
DataNode(int value, boolean type){
this.value = value;
this.type = type;
}
}
The value would simply consist of all the values in range 1 to n. The type variable would be assigned value as true if the item we are inserting in the tree is present in the set S. If not, it would be marked as false.
This would take O(n) time for creation. Deletion can be done in O(logn) time.
For pred(i), we can achieve average case time complexity to be around O(logn) if I am correct. The algorithm for pred(i) shall be something like this:
Locate the element i in the tree. If type is true, then return the inorder predecessor of this element i if the type value of this predecessor is true.
If it is false, recur for the next predecessor of this element(i.e. predecessor of i-1) until we find an element i whose type = true.
If no such predecessor is found such that type = true, then return 0.
I hope we can optimize this approach further to make it in O(α(n)) for pred(i).

Related

Dynamic sets using RB trees

let us say we have a dynamic set of S of integers and an index i, we wish to find the i-th smallest negative number in S written in increasing order, if any.
example:
S= {-5, -2, -1, 2, 5} the naswer is -1 for i=3 and is undefined for i = 4.
the objective is to choose the red-black tree as an underlying data structure and define an additional attribute that allows to solve the problem in O(lg n) time. Any guides on the algorithm should be used to solve such a question?
It's called Order Statistic Tree (https://en.wikipedia.org/wiki/Order_statistic_tree).
In general, you extend your tree node with an extra attribute, the size of subtrees. For a leaf, it's 1, for an inner node, it's
size(left_subtree) + size(right_subtree) + 1
Wiki has a clear explanation and pseudocode. It works with any kind of balanced tree (RB/AVl/Treap/etc), you need to support the size of subtrees during rotation (or any tree modification).

Find number of items with weight k in a range (with updates and queries)

I am trying to solve the following problem:
Given an array of items with integer weights (arbitrary order), we can have 2 possible operations:
Query: Output the number of items that are of weight k, in the
range x to y.
Update: Change the weight of an item at a certain
index to v.
Example:
Given the array: [1,2,3,2,5,6,7,3]
If we query for the number of items with weight 2 from index 1 to 3, then the answer would be 2.
If we modify the element at index 2 to have a weight of 2, then we make the same query again, the answer would be 3.
This is certainly a segment tree problem (using point updates). However, I am facing a problem here - each segment will only hold the answer for 1 index. Hence, it seems that I must use vectors in my segment tree. But this would overcomplicate things. Furthermore, I am not sure how to do that either.
Is anyone able to advise me of a better solution?
Instead of segment tree, you should use binary search tree (BST), like AVL, Treap, Splay and etc.
At first, store all the indexes of all appeared values in separated BSTs. In your example [1,2,3,2,5,6,7,3], there should be six BSTs:
BST 1: [0],
BST 2: [1,3],
BST 3: [2,7],
BST 5: [4],
BST 6: [5],
BST 7: [6]
For each query (x, y, k), count the number of elements that fall in range [x, y] in SBT k.
For each update weight[x] = v, remove x from BST weight[x] and add x to BST v
Time complexity: O(nlogn + mlogn) where n is the length of the data and m is the number of operations.
Space complexity: O(n)

Time complexity with insertion sort for 2^N array?

Consider an array of integers, which has a size of 2^N, where the element at index X (0 ≤ X < 2N) is X xor 3 (that is, two least significant bits of X are flipped). What is the running time of the insertion sort on this array?
Examine the structure of what the lists looks like:
For n = 2:
{3, 2, 1, 0}
For n = 3 :
{3, 2, 1, 0, 7, 6, 5, 4}
For insertion sort, you're maintaining the invariant that the list from 1 up to your current index is sorted, so you're task at each step is to place the current element into it's correct place among the sorted elements before it. In the worst case, you will have to traverse all previous indices before you can insert the current value (think of the case where the list is in reverse sorted order). But it's clear from the structure above that for a list with the property that each value is equivalent to the index ^ 3, that the furthest back in the list that you would have to go from any given index is 3. So you've reduced the possibility that you'd have to do O(n) work at the insertion step to a constant factor. But you still have to do O(n) work to examine each element of the list. So, for this particular case, the running time of insertion sort is linear in the size of the input, whereas in the worst case it is quadratic.

Find the first number greater than a given one in an unsorted sequence

Given a sequence of positive integers and an integer M, return the first number in the sequence which is greater than M (or -1 if it doesn't exist).
Example: sequence = [2, 50, 8, 9, 1], M = 3 -> return = 50
O(log n) for each query required (after preprocessing).
I've thought of BSTs, but given an ascending sequence i'd get just a long path, which wouldn't give me O(logn) time...
EDIT: Used structure should be also easy to modify, i.e. it should be possible to replace found element with that given one and repeat searching for another M (everything - apart from preprocessing - O(logn)). And of course it'd be nice, if i could change 'first greater' to 'first smaller' and didn't have to change too much in the algorithm. And if it helps, we may assume all numbers are positive and there are no repetitions.
Create a second array (let it be aux) where for each element i: aux[i] = max { arr[0],arr[1], ... ,arr[i]} (the maximum of all elements with index j <= i in the original array).
It is easy to see that this array is sorted by nature, and a simple binary search on aux will yield the needed index. (It is easy to get with a binary search the first element that is greater then the requested element if the element does not exist).
Time complexity is O(n) pre-processing (done only once) and O(logn) per query.

Binary Tree array list representation

I have been doing some research on Binary trees, and the array list representation. I am struggling to understand that the worst case space complexity is O(2^n). Specifically, the book states, the space usage is O(N) (N = array size), which is O(2^n) in the worst case . I would have thought it would have been 2n in the worst case as each node has two children (indexes) not O(2^n), where n = no. of elements.
an example, if I had a binary tree with 7 nodes, then the space would be 2n = 14 not 2^n = 128.
This is Heap implementation on an array. Where
A[1..n]
left_child(i) = A[2*i]
right_child(i) = A[2*i+1]
parent(i) = A[floor(i/2)]
Now, come to space. Think intuitively,
when you insert first element n=1, location=A[1], similarly,
n=2 #A[2] left_child(1)
n=3 #A[3] right_child(1)
n=4 #A[4] left_child(2)
n=5 #A[5] right_child(2)
You see, nth element will go into A[n]. So space complexity is O(n).
When you code you just plug-in the element to be inserted in the end say at A[n+1], and say that it's a child of floor((n+1)/2).
Refer: http://en.wikipedia.org/wiki/Binary_heap#Heap_implementation
Heap is a nearly complete tree, so total number of elements in the tree would be 2h-1 < n <= 2h+1-1 and this is what the length of array you will need. Refer: this
The worst case space complexity of a binary tree is O(n) (not O(2^n) in your question), but using arrays to represent binary trees can save the space of pointers if it's nearly a complete binary tree.
See http://en.wikipedia.org/wiki/Binary_tree#Arrays
I think this refers to storing arbitrary binary trees in an array representation, which is normally used for complete or nearly complete binary trees, notably in the implementation of heaps.
In this representation, the root is stored at index 0 in the array, and for any node with index n, its left and right children are stored at indices 2n+1 and 2n+2, respectively.
If you have a degenerate tree where no nodes have any right children (the tree is effectively a linked list), then the first items will be stored at indices 0, 1, 3, 7, 15, 31, .... In general, the nth item of this list (starting from 0) will be stored at index 2n-1, so in this case the array representation requires θ(2n) space.

Resources