Writing a data structure with O(n) initialization and O(1) lookup - data-structures

Given an array a1, a2, a3, ... , aN
I want to create a data structure that supports the following requirements:
Initialization data structure in O(n)
Get the 4n/5 value in O(1)
Get the n/5 value in O(1)
I tried to build the data structures with 3 max heaps but I can't initialize the heaps in O(n).
How can I figure it out?

The so-called "median of medians" algorithm
can find the kth smallest element in an unordered set in O(n) time.
You want to apply this for k = n/5 and k = 4n/5.
The result of the algorithm is a partially-ordered array where the desired kth
element is in location k. After putting the n/5th element in its place
I think it is possible to put the 4n/5th element in its place without having to
re-do the entire algorithm, but in any case it is still O(n).
Assuming you have O(1) random-access lookup in the array then your O(1) lookup
requirement will be satisfied.

Related

Finding k largest elements in an array in O(1) time

Is it possible to have O(1) time complexity in find the k largest or smallest numbers in an array, by making a stack class with an auxiliary data structure to track k largest/smallest in every push() and pop(). Since retrieval is O(1), return k elements in a get method
Yes, can find out Kth largest element or smallest element by O(1) complexity only if your array is in sorted order.
If you are looking for a well-known data structure you can find Max-Heap and Min-Heap useful. You can find more about that here.
Update
As you have updated your question from max and min to k largest, you can preprocess your data into a sorted array and then insert new value using the insertion sort strategy. Then, you can report the k largest value in O(k) (and if the k is constant it would be in O(1)).
you can try the below link that discuss about the finding largest number
https://www.geeksforgeeks.org/k-largestor-smallest-elements-in-an-array/

existence of a certain data structure

I'm wondering, can there exists such a data stucture under the following criterions and times(might be complicated)?
if we obtain an unsorted list L an build a data structure out of it like this:
Build(L,X) - under O(n) time, we build the structure S from an unsorted list of n elements
Insert (y,S) under O(lg n) we insert z into the structure S
DEL-MIN(S) - under O(lg n) we delete the minimal element from S
DEL-MAX(S) - under O(lg n) we delete the maximal element from S
DEL-MId(S) - under O(lg n) we delete the upper medial(ceiling function) element from S
the problem is that the list L is unsorted. can such a data structure exist?
DEL-MIN and DEL-MAX are easy: keep a min-heap and max-heap of all the elements. The only trick is that you have to keep indices of the value in the heap so that when (for example) you remove the max, you can also find it and remove it in the min-heap.
For DEL-MED, you can keep a max-heap of the elements less than the median and a min-heap of the elements greater than or equal to the median. The full description is in this answer: Data structure to find median. Note that in that answer the floor-median is returned, but that's easily fixed. Again, you need to use the cross-indexing trick to refer to the other datastructures as in the first part. You will also need to think how this handles repeated elements if that's possible in your problem formulation. (If necessary, you can do it by storing repeated elements as (count, value) in your heap, but this complicates rebalancing the heaps on insert/remove a little).
Can this all be built in O(n)? Yes -- you can find the median of n things in O(n) time (using the median-of-median algorithm), and heaps can be built in O(n) time.
So overall, the datastructure is 4 heaps (a min-heap of all the elements, a max-heap of all the elements, a max-heap of the floor(n/2) smallest elements, a min-heap of the ceil(n/2) largest elements. All with cross-indexes to each other.

Algorithms Runtime complexity:

At careercup site, there was this problem (https://careercup.com/question?id=6270877443293184):
Given an array, find the number of tuples such that
A [i] + A [j] + A [k] = A [l] in an array, where i <j <k <l.
The proposed solution (below) there works but state runtime complexity of O(n^2). After analyzing the code, I don't think it can be done in less than n^2 * log n. My rationale is that it iterates through all elements in the 2d array (which is n^2 and them in a list that contains the tuples, check for each one, which is O(n). Even using TreeMap and doing a binary search can only reduce is to log n, not to constant time. Can someone confirm if this can be done in O(n^2) and explain me what is incorrect in my logic?
Proposed solution:
Fill 2 2d arrays with
arr1[i][j]=a[i]+a[j]
arr2[i][j]=a[j]-a[i]
j>i
In a map<int,list<int>>, map[arr1[i][j]].push_back(j)
For each element in arr2, search in the map.
Count all hits where j < k
It's pretty easy to insert j in increasing order in map[arr1[][]].
If you enumerate each element in arr2 in increasing k order you don't you have to do a binary search. You can just enumerate all the j.
Because you are going in increasing k order, for each list in the map you just have to remember where was the last one that you saw. So the map should rather be a map<int, <int, list<int>>>.
Since you only touch each j once and your complexity is only O(n^2).
You are right, in the worst case this is not O(n^2).
The author obviously assumed that list from map<int,list<int>> would contain only few members; it is the assumption similar to the one that we use when we state that the complexity of find operation of a hash table is O(1). Recall, a hash table whose collision resolution is based on separate chaining has a constant complexity of find operation on average, but in case when many elements hash to the same value it can degrade to a linear complexity.
Implementation-wise, notice that map map<int,list<int>> needs to be a hash table (i.e. std::unordered_map in C++, HashMap in Java), not std::map (or TreeMap in Java) because with std::map just the find operation is O(logn).

Find sum of n largest elements online

I am solving a problem but I got stuck on this part.
There are 3 types of query: add a element (integer), remove a element, get sum of n (n can be any integer) largest elements. How can I do this efficient ? I am current use this solution: add a element , remove a element (binary search, O(lg n) ). getSum (naive, O(n) ).
A segment tree is commonly used to find the sum of a given range. Building that on top of a binary search tree should get the data structure you are looking for with O(log N) adds, remove and sum given range. By querying sum over the range where the k-largest elements are (roughly N-k to N), you can get the sum of the k-largest elements in O(log N). The result being a mutable ordered segment tree rather than the standard immutable (static) unordered one.
Basically, you just add variables to hold the number of children and the sum of their values to each parent node and use that information to find the sum via O(log N) additions and/or subtractions.
If k is fixed, you can use the same approach that allows for O(1) find-min/max in heaps to allow for O(1) find the k-largest elements sum simply by updating a variable holding the value during each O(log N) add/remove.
A lot depends on the relative frequency of the queries but if we assume a typical situation where the sum query will be much more frequent than the add-remove requests (and add is more frequent than remove), the solution is to store a tuple of the sums and the numbers.
So the first element will be (a1, a1), the second element in your list will be (a2, a1+a2) and so on. (Note that when you insert a new element in the k-th position, you still don't need to do the whole sum, just add the new number to the preceding element's sum.)
Removals will be quite expensive though but that's the trade-off for an O(1) sum query.

Efficiently find order statistics of unsorted list prefixes?

A is an array of the integers from 1 to n in random order.
I need random access to the ith largest element of the first j elements in at least log time.
What I've come up with so far is an n x n matrix M, where the element in the (i, j) position is the ith largest of the first j. This gives me constant-time random access, but requires n^2 storage.
By construction, M is sorted by row and column. Further, each column differs from its neighbors by a single value.
Can anyone suggest a way to compress M down to n log(n) space or better, with log(n) or better random access time?
I believe you can perform the access in O(log(N)) time, given O(N log(N)) preprocessing time and O(N log(N)) extra space. Here's how.
You can augment a red-black tree to support a select(i) operation which retrieves the element at rank i in O(log(N)) time. For example, see this PDF or the appropriate chapter of Introduction to Algorithms.
You can implement a red-black tree (even one augmented to support select(i)) in a functional manner, such that the insert operation returns a new tree which shares all but O(log(N)) nodes with the old tree. See for example Purely Functional Data Structures by Chris Okasaki.
We will build an array T of purely functional augmented red-black trees, such that the tree T[j] stores the indexes 0 ... j-1 of the first j elements of A sorted largest to smallest.
Base case: At T[0] create an augmented red-black tree with just one node, whose data is the number 0, which is the index of the 0th largest element in the first 1 elements of your array A.
Inductive step: For each j from 1 to N-1, at T[j] create an augmented red-black tree by purely functionally inserting a new node with index j into the tree T[j-1]. This creates at most O(log(j)) new nodes; the remaining nodes are shared with T[j-1]. This takes O(log(j)) time.
The total time to construct the array T is O(N log(N)) and the total space used is also O(N log(N)).
Once T[j-1] is created, you can access the ith largest element of the first j elements of A by performing T[j-1].select(i). This takes O(log(j)) time. Note that you can create T[j-1] lazily the first time it is needed. If A is very large and j is always relatively small, this will save a lot of time and space.
Unless I misunderstand, you are just finding the k-th order statistic of an array which is the prefix of another array.
This can be done using an algorithm that I think is called 'quickselect' or something along those lines. Basically, it's like quicksort:
Take a random pivot
Swap around array elements so all the smaller ones are on one side
You know this is the p+1th largest element where p is the number of smaller array elements
If p+1 = k, it's the solution! If p+1 > k, repeat on the 'smaller' subarray. If p+1 < k, repeat on the larger 'subarray'.
There's a (much) better description here under the Quickselect and Quicker Select headings, and also just generally on the internet if you search for k-th order quicksort solutions.
Although the worst-case time for this algorithm is O(n2) like quicksort, its expected case is much better (also like quicksort) if you properly select your random pivots. I think the space complexity would just be O(n); you can just make one copy of your prefix to muck up the ordering for.

Resources