What can be an efficient data structure for this?

What can be an efficient data structure for this? - algorithm

The problem link is: http://www.spoj.com/problems/ORDERSET/en/
In this problem, you have to maintain a dynamic set of numbers which support the two fundamental operations
INSERT(S,x): if x is not in S, insert x into S
DELETE(S,x): if x is in S, delete x from S
and the two type of queries
K-TH(S) : return the k-th smallest element of S
COUNT(S,x): return the number of elements of S smaller than x
Input
Line 1: Q (1 ≤ Q ≤ 200000), the number of operations
In the next Q lines, the first token of each line is a character I, D, K or C meaning that the corresponding operation is INSERT, DELETE, K-TH or COUNT, respectively, following by a whitespace and an integer which is the parameter for that operation.
If the parameter is a value x, it is guaranteed that 0 ≤ |x| ≤ 109. If the parameter is an index k, it is guaranteed that 1 ≤ k ≤ 109.
Output
For each query, print the corresponding result in a single line. In particular, for the queries K-TH, if k is larger than the number of elements in S, print the word 'invalid'.
I thought of using a set here, since the insertion and deletion in a set can be done in logarithmic time. However, I am not sure if set is the ideal data structure for finding the k'th element and number of elements less than it. What other DS can I use to make it optimal. Thanks!

A set is an abstract data type, not a data structure. An hash table is a data structure that can be used to implements a set, as is a binary tree.
Speaking of binary tree, it seems well suited for your needs here.
If balanced : fast INSERT and DELETE
If you keep track of subnodes count : K-TH and COUNT should be very fast too

Related

How to find 2 special elements in the array in O(n)

Let a1,...,an be a sequence of real numbers. Let m be the minimum of the sequence, and let M be the maximum of the sequence.
I proved that there exists 2 elements in the sequence, x,y, such that |x-y|<=(M-m)/n.
Now, is there a way to find an algorithm that finds such 2 elements in time complexity of O(n)?
I thought about sorting the sequence, but since I dont know anything about M I cannot use radix/bucket or any other linear time algorithm that I'm familier with.
I'd appreciate any idea.
Thanks in advance.

First find out n, M, m. If not already given they can be determined in O(n).
Then create a memory storage of n+1 elements; we will use the storage for n+1 buckets with width w=(M-m)/n.
The buckets cover the range of values equally: Bucket 1 goes from [m; m+w[, Bucket 2 from [m+w; m+2*w[, Bucket n from [m+(n-1)*w; m+n*w[ = [M-w; M[, and the (n+1)th bucket from [M; M+w[.
Now we go once through all the values and sort them into the buckets according to the assigned intervals. There should be at a maximum 1 element per bucket. If the bucket is already filled, it means that the elements are closer together than the boundaries of the half-open interval, e.g. we found elements x, y with |x-y| < w = (M-m)/n.
If no such two elements are found, afterwards n buckets of n+1 total buckets are filled with one element. And all those elements are sorted.
We once more go through all the buckets and compare the distance of the content of neighbouring buckets only, whether there are two elements, which fulfil the condition.
Due to the width of the buckets, the condition cannot be true for buckets, which are not adjoining: For those the distance is always |x-y| > w.
(The fulfilment of the last inequality in 4. is also the reason, why the interval is half-open and cannot be closed, and why we need n+1 buckets instead of n. An alternative would be, to use n buckets and make the now last bucket a special case with [M; M+w]. But O(n+1)=O(n) and using n+1 steps is preferable to special casing the last bucket.)
The running time is O(n) for step 1, 0 for step 2 - we actually do not do anything there, O(n) for step 3 and O(n) for step 4, as there is only 1 element per bucket. Altogether O(n).
This task shows, that either sorting of elements, which are not close together or coarse sorting without considering fine distances can be done in O(n) instead of O(n*log(n)). It has useful applications. Numbers on computers are discrete, they have a finite precision. I have sucessfuly used this sorting method for signal-processing / fast sorting in real-time production code.
About #Damien 's remark: The real threshold of (M-m)/(n-1) is provably true for every such sequence. I assumed in the answer so far the sequence we are looking at is a special kind, where the stronger condition is true, or at least, for all sequences, if the stronger condition was true, we would find such elements in O(n).
If this was a small mistake of the OP instead (who said to have proven the stronger condition) and we should find two elements x, y with |x-y| <= (M-m)/(n-1) instead, we can simplify:
-- 3. We would do steps 1 to 3 like above, but with n buckets and the bucket width set to w = (M-m)/(n-1). The bucket n now goes from [M; M+w[.
For step 4 we would do the following alternative:
4./alternative: n buckets are filled with one element each. The element at bucket n has to be M and is at the left boundary of the bucket interval. The distance of this element y = M to the element x in the n-1th bucket for every such possible element x in the n-1thbucket is: |M-x| <= w = (M-m)/(n-1), so we found x and y, which fulfil the condition, q.e.d.

First note that the real threshold should be (M-m)/(n-1).
The first step is to calculate the min m and max M elements, in O(N).
You calculate the mid = (m + M)/2value.
You concentrate the value less than mid at the beginning, and more than mid at the end of he array.
You select the part with the largest number of elements and you iterate until very few numbers are kept.
If both parts have the same number of elements, you can select any of them. If the remaining part has much more elements than n/2, then in order to maintain a O(n) complexity, you can keep onlyn/2 + 1 of them, as the goal is not to find the smallest difference, but one difference small enough only.
As indicated in a comment by #btilly, this solution could fail in some cases, for example with an input [0, 2.1, 2.9, 5]. For that, it is needed to calculate the max value of the left hand, and the min value of the right hand, and to test if the answer is not right_min - left_max. This doesn't change the O(n) complexity, even if the solution becomes less elegant.
Complexity of the search procedure: O(n) + O(n/2) + O(n/4) + ... + O(2) = O(2n) = O(n).

Damien is correct in his comment that the correct results is that there must be x, y such that |x-y| <= (M-m)/(n-1). If you have the sequence [0, 1, 2, 3, 4] you have 5 elements, but no two elements are closer than (M-m)/n = (4-0)/5 = 4/5.
With the right threshold, the solution is easy - find M and m by scanning through the input once, and then bucket the input into (n-1) buckets of size (M-m)/(n-1), putting values that are on the boundaries of a pair of buckets into both buckets. At least one bucket must have two values in it by the pigeon-hole principle.

Select some sets, and union them together to form main set, in a way that minimizes the cost

Definition
Set P={e1,e2,...,en},P has n different elements,enumerated as ei's in it.
Set I={e1',e2',...,en'},I has at least one element that is similar to some element of P.The number of elements in I need not be equal to the number of elements in P.
Each I has a weight Q associated with it, and that describes the cost to use it .Q>0
You have to help me in designing an algorithm, that takes a set P as input, and some (say k of them) I sets, denoted by I1,I2,. . . , Ik, and exactly k, Q values, denoted by Q1,Q2,. . . ,Qk. Q1 denots the cost to use set I1, and so on.
You have to choose some I's, say I1,I2,. . . , such that when they all are unioned together, they produce set P' and P is a subset of that.
Notice that once you find a selection of I's, it has a cost associated with it.
You also have to make sure that this cost is as MINIMUM as possible.
Input
input one Set P
input a list of Set I,IList={I1,I2,...In}
input a list of Set Q,QList={Q1,Q2,...Qn}
Ix Qx are corresponding one by one.
Output
P' = Ia union Ib...union In
P' ⊂ P
Make the Qa+Qb...+Qn be the min value.
Also mention the Time and Space Complexity of your algorithm
Sample Input
P={a,b,c}
I1={a,x,y,z} Q1=0.7
I2={b,c,x} Q2=1
I3={b,x,y,z} Q3=2
I4={c,y} Q4=3
I5={a,b,c,y} Q5=9
Sample Output
P1 = I1 U I2 COST=Q1+Q2=1.7
P2 = I1 U I3 U I4 COST=Q1+Q3+Q4=5.7
P3 = I5 COST=Q5=9
And:P⊂P1,P⊂P2,P⊂P3
The P COST : 1.7<5.7<9
And then what we want is:
P1 = I1 U I2 COST=Q1+Q2=1.7

Here is some suggestion to simplify the problem.
We first duplicate all the I sets, and lets call them I1', I2', . . .
Now, first job that we should do is to remove the unwanted elements from duplicated I' sets. Here unwanted means the elements which will not contribute towards the main set P.
We discard all those I' sets which do not have even a single element of P.
Now suppose P has some n elements in it, we now know definitely that I' sets are nothing but subsets of the main set, and every subset has a cost Qi associated with it.
We just have to pick some subsets such that they together cover the main set.
Subject to the minimum cost.
We will denote the main set and subsets using bit based notation.
If the set P has n elements in it, we will have n bits in the representation.
So the main set will be denoted by <1,1,...1> (n 1's).
And it's subsets will be denoted by bitset, having some 1's absent from the bitset of main set. Because I's are also subsets, they will also have some binary representation denoting the subset they are representing.
To solve the problem efficiently, let's make an assumption that there is so much of memory available, that if the bitset is treated as a number in binary, we can index the bitsets, to some memory location in constant time.
This means that, if we have, suppose n = 4, all the subsets can be represented
by different values from 0 to 15 (see their binary representation from 0000(empty set) to 1111(main set), when element at position i of main array is present in a subset we put a 1 at that position in the bitset). And similarly when n is larger.
Now, having the bitset based notation for the set, the Union of two sets denoted by bitset b1 and b2 will be denoted by b1|b2. where | is bitwise OR operation.
Of course, we will not require so many memory locations, as not all the subsets of parent set will be available as I's.
Algorithm :
The algorithmic idea used here is bitset based Dynamic Programming.
Assume we have a big array, namely COST, where COST[j] represents the cost to have the subset, represented by bitset notation of j.
To start with the algorithm, we first put the cost to choose given subsets (in terms of I's), in their respective indices in COST array, and at all the other locations we put a very large value, say INF.
What we have to do is, to fill the array appropriately, and then once it is filled properly, we will get the answer to minimum cost by looking at the value COST[k] where k has all bits set, in binary representation.
Now we will focus on how to fill the array properly.
This is rather easy task, we will iterate the COST array, K no. of times where K is the no. of I'-sets we have.
For every I's set, let's call it's binary representation BI'.
we OR the bit representation of BI' and current index(idx), and what we get is the new set which is the UNION of the set represented by current index, and BI', let's call this new set as S' and it's final binary representation as BS'.
We will look at the COST[BS'], and if we see that this COST is larger than COST[BI'] + COST[idx], we will update the value at the COST[BS'].
In similar way we proceed, and at the end of the run, we get the minimum cost at COST[BP], where BP is the bitset for P.
In order to track the participating I's, who actually contributed in the formation of P, we can take a note, while updating any index.
TIME COMPLEXITY : O(2^n * K), where K is the no. of I sets, and n is the no. of elements in P.
Space Complexity : O(2^n)
NOTE : Because of the assumption, that the bit-representation are directly indexable, the solution may not be very much feasible for large values of n and k.

Is it possible to query number of distinct integers in a range in O(lg N)?

I have read through some tutorials about two common data structure which can achieve range update and query in O(lg N): Segment tree and Binary Indexed Tree (BIT / Fenwick Tree).
Most of the examples I have found is about some associative and commutative operation like "Sum of integers in a range", "XOR integers in a range", etc.
I wonder if these two data structures (or any other data structures / algorithm, please propose) can achieve the below query in O(lg N)? (If no, how about O(sqrt N))
Given an array of integer A, query the number of distinct integer in a range [l,r]
PS: Assuming the number of available integer is ~ 10^5, so used[color] = true or bitmask is not possible
For example: A = [1,2,3,2,4,3,1], query([2,5]) = 3, where the range index is 0-based.

Yes, this is possible to do in O(log n), even if you should answer queries online. However, this requires some rather complex techniques.
First, let's solve the following problem: given an array, answer the queries of form "how many numbers <= x are there within indices [l, r]". This is done with a segment-tree-like structure which is sometimes called Merge Sort Tree. It is basically a segment tree where each node stores a sorted subarray. This structure requires O(n log n) memory (because there are log n layers and each of them requires storing n numbers). It is built in O(n log n) as well: you just go bottom-up and for each inner vertex merge sorted lists of its children.
Here is an example. Say 1 5 2 6 8 4 7 1 be an original array.
|1 1 2 4 5 6 7 8|
|1 2 5 6|1 4 7 8|
|1 5|2 6|4 8|1 7|
|1|5|2|6|8|4|7|1|
Now you can answer for those queries in O(log^2 n time): just make a reqular query to a segment tree (traversing O(log n) nodes) and make a binary search to know how many numbers <= x are there in that node (additional O(log n) from here).
This can be speed up to O(log n) using Fractional Cascading technique, which basically allows you to do the binary search not in each node but only in the root. However it is complex enough to be described in the post.
Now we return to the original problem. Assume you have an array a_1, ..., a_n. Build another array b_1, ..., b_n, where b_i = index of the next occurrence of a_i in the array, or ∞ if it is the last occurrence.
Example (1-indexed):
a = 1 3 1 2 2 1 4 1
b = 3 ∞ 6 5 ∞ 8 ∞ ∞
Now let's count numbers in [l, r]. For each unique number we'll count its last occurrence in the segment. With b_i notion you can see that the occurrence of the number is last if and only if b_i > r. So the problem boils down to "how many numbers > r are there in the segment [l, r]" which is trivially reduced to what I described above.
Hope it helps.

If you're willing to answer queries offline, then plain old Segment Trees/ BIT can still help.
Sort queries based on r values.
Make a Segment Tree for range sum queries [0, n]
For each value in input array from left to right:
Increment by 1 at current index i in the segment tree.
For current element, if it's been seen before, decrement by 1 in
segment tree at it's previous position.
Answer queries ending at current index i, by querying for sum in range [l, r == i].
The idea in short is to keep marking rightward indexes, the latest occurrence of each individual element, and setting previous occurrences back to 0. The sum of range would give the count of unique elements.
Overall time complexity again would be nLogn.

There is a well-known offline method to solve this problem. If you have n size array and q queries on it and in each query, you need to know the count of distinct number in that range then you can solve this whole thing in O(n log n + q log n) time complexity. Which is similar to solve every query in O(log n) time.
Let's solve the problem using the RSQ( Range sum query) technique. For the RSQ technique, you can use a segment tree or BIT. Let's discuss the segment tree technique.
For solving this problem you need an offline technique and a segment tree. Now, what is an offline technique?? The offline technique is doing something offline. In problem-solving an example of the offline technique is, You take input all queries first and then reorder them is a way so that you can answer them correctly and easily and finally output the answers in the given input order.
Solution Idea:
First, take input for a test case and store the given n numbers in an array. Let the array name is array[] and take input q queries and store them in a vector v. where every element of v hold three field- l, r, idx. where l is the start point of a query and r is the endpoint of a query and idx is the number of queries. like this one is n^th query.
Now sort the vector v on the basis of the endpoint of a query.
Let we have a segment tree which can store the information of at least 10^5 element. and we also have an areay called last[100005]. which stores the last position of a number in the array[].
Initially, all elements of the tree are zero and all elements of the last are -1.
now run a loop on the array[]. now inside the loop, you have to check this thing for every index of array[].
last[array[i]] is -1 or not? if it is -1 then write last[array[i]]=i and call update() function of which will add +1 in the last[array[i]] th position of segment tree.
if last[array[i]] is not -1 then call update() function of segment tree which will subtract 1 or add -1 in the last[array[i]] th position of segment tree. Now you need to store current position as last position for future. so that you need to write last[array[i]]=i and call update() function which will add +1 in the last[array[i]] th position of segment tree.
Now you have to check whether a query is finished in the current index. that is if(v[current].r==i). if this is true then call query() function of segment tree which will return and sum of the range v[current].l to v[current].r and store the result in the v[current].idx^th index of the answer[] array. you also need to increment the value of current by 1.
6. Now print the answer[] array which contains your final answer in the given input order.
the complexity of the algorithm is O(n log n).

The given problem can also be solved using Mo's (offline) algorithm also called Square Root decomposition algorithm.
Overall time complexity is O(N*SQRT(N)).
Refer mos-algorithm for detailed explanation, it even has complexity analysis and a SPOJ problem that can be solved with this approach.

kd-trees provide range queries in O(logn), where n is the number of points.
If you want faster query than a kd-tree, and you are willing to pay the memory cost, then Range trees are your friends, offering a query of:
O(logdn + k)
where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.
Bentley is an important name when it comes to this field. :)

Building a data structure

Given natural number m≥2, I need to think of a data structure, for a group of natural numbers which will support the following functions, with run-time restriction:
Init() - Initialize the the data structure - O(1)
Insert(k) - Insert a new number k to the data structure (only if does not exist). - O(logn)
delete(k) - delete number k from the data structure. - O(logn)
sumDivM() - Return amount of numbers which divides m without remainder. - O(1)
equi(k) - find number x, where (x-k) divides m without remainder. return FALSE if there is no such number. - O(log(min(m,n)))
pairDivM() - Return TRUE iff the data structure contains pair of numbers that their sum divides m without remainder, FALSE otherwise. - O(1)
n is the number of elements currently at the structure.
I thought of AVL tree - where Init, insert and delete are ok with the run time restriction.
for sumDivM i will have a int field, which will increase by 1 everytime we insert a number which divides m without remainder (insert function will check this). this way i can return the amount by O(1).
for equi(k) and pairDivM() - i could not find a solution without running on the tree which is prohibited because of runtime. Any Ideas?

First of all notice that, given a number x, you really only care about x mod m, so you might as well calculate that and use that as a key to enter x into the AVL tree, which is acting as a map with key (x mod m) and value a set of numbers which are all equal to x, mod m.
For equi(k), just look up (-k mod m).
For pairDivM(), when you enter a number x into the data structure, look up the size of the set for (-x mod m) because for all of these numbers you are creating a new pair of numbers such that (x + y) = 0 mod m. You can answer pairDivM() at O(1) cost if you maintain a count of how many such pairs there are, modifying it when you insert and delete numbers.
For the other operations, I think either you've covered them or they are fairly obvious given that you have an AVL tree for (x mod m).

Generate all lists of size n, such that each element is between 0 and m (inclusive)

Generate all lists of size n, such that each element is between 0 and m (inclusive).
There are (m+1)^n such lists.

There are two easy ways of writing the general case. One is described in the existing answer from #didierc. The alternative is recursion.
For example, think about a method that takes a String as an argument:
if(input string is long enough)
print or store it
else
iterate over digit range
recursive call with the digit appended to the string

This is just like enumerating all the numbers in base (m+1) of n digits.
start with a list of n zeros
do the following loop
yeld the list as a new answer
increment the first element, counting in base (m+1), and propagate the carry recursively on its next element
if there is a carry left, exit the loop
Update:
just for fun, what would be the solution, if we add the restriction that all digits must remain different (like a lottery number, as it was initially stated - and of course we suppose that m >= n) ?
We proceed by enumerating all the numbers with the restriction stated above, and also that any element must be greater than its successor in the list (ie the digit of rank k < n is larger than the digit of rank k+1).
This is implemented by simply checking when computing the carry that the current digit will not become equal to its predecessor, and if so propagate the carry further.
Then, for each list yelded by enumeration, compute all the possible permutations. There are known algorithms to perform that
computation, see for instance the Johnson-Trotter algorithm, but one can build a simpler recursive algorithm:
function select l r:
if the list r is empty, yeld l
else
for each element x of the list r
let l' be the list of x and l
and r' the remaining elements of r
call select l' r'

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio