Updating values at indices which are factors of a number - algorithm

Given an array with n integers. The following q queries of 2 types have to be handled :
1) Given k and c, add c to all indices which are factors of k
2) Output value at a given index
Whats an approach to handle each query with O(logn)/O(1) complexity ?
My attempt :
Calculate all factors of k and update indices. But complexity of this approach is O(q*sqrt(n)). I thought of creating some tree type data structure to store values in but couldnt proceed ahead.Any solution/hint on how to proceed ?

Related

Algorithm: Count the minimum number of distinct sorted subsequence for each update query

This question is asked to me in an interview:
Distinct sorted subsquence containing adjacent values is defined as either its length is one or it only contains adjacent numbers when sorted. Each element can belong to only 1 such subsequence. I have Q queries, each updating a single value in A and I have to answer for each query, how many parts would be in the partition of A into distinct sorted subsequences if the number of parts was minimized.
For example, the number of parts for A = [1,2,3,4,3,5] can be minimized by partitioning it in the following two ways, both of which contain only two parts:
1) [1,2,3] && [4,3,5] ==> answer 2 (4,3,5 sorted is 3,4,5 all adjacent values)
2) [1,2,3,4,5] && [3] ==> answer 2
Approach I tried: Hashing and forming sets but all test cases were not cleared because of Timeout.
Problem Statment PDF : PDF
Constraints:
1<= N,Q <= 3*10^5
1< A(i)< 10^9
Preprocessing
First you can preprocess A before all queries and generate a table (say times_of) such that when given a number n, one can efficiently obtain the number of times n appears in A through expression like times_of[n]. In the following example assuming A is of type int[N], we use an std::map to implement the table. Its construction costs O(NlogN) time.
auto preprocess(int *begin, int *end)
{
std::map<int, std::size_t> times_of;
while (begin != end)
{
++times_of[*begin];
++begin;
}
return times_of;
}
Let min and max be the minimum and maximum elements of A respectively. Then the following lemma applies:
The minimum number of distinct sorted subsequences is equal to max{0, times_of[min] - times_of[min-1]} + ... + max{0, times_of[max] -
times_of[max-1]}.
A rigorous proof is a bit technical, so I omit it from this answer. Roughly speaking, consider numbers from small to large. If n appears more than n-1, it has to bring extra times_of[n]-times_of[n-1] subsequences.
With this lemma, we can compute initially the minimum number of distinct sorted subsequences result in O(N) time (by iterating through times_of, not by iterating from min to max). The following is a sample code:
std::size_t result = 0;
auto prev = std::make_pair(min - 1, static_cast<std::size_t>(0));
for (auto &cur : times_of)
{
// times_of[cur.first-1] == 0
if (cur.first != prev.first + 1) result += cur.second;
// times_of[cur.first-1] == times_of[prev.first]
else if (cur.second > prev.second) result += cur.second - prev.second;
prev = cur;
}
Queries
To deal with a query A[u] = v, we first update times_of[A[u]] and times_of[v] which costs O(logN) time. Then according to the lemma, we need only to recompute constant (i.e. 4) related terms to update result. Each recomputation costs O(logN) time (to find the previous or next element in times_of), so a query takes O(logN) time in total.
Keep a list of clusters on the first pass. Each has a collection of values, with a minimum and maximum value. These clusters could very well be stored in a segment tree (making it easy to merge in case they ever touch).
Loop through your N numbers, and for each number, either add it to an existing cluster (possibly triggering a merge), or create a new cluster. This may be easier if your clusters store min-1 and max+1.
Now you are done with the initial input of N numbers, and you have several clusters, all of which are likely to be of a reasonable size for radix sort.
However, you do not want to finish the radix sort. Generate the list of counts, then use this to count adjacent clusters. Loop through this, and every time the count decreases, you have found (difference) many of your final distinct sorted subsequences. (Using max+1 pays off again, because the guaranteed zero at the end means you don't have to add a special case after the loop.)

Is it possible to query number of distinct integers in a range in O(lg N)?

I have read through some tutorials about two common data structure which can achieve range update and query in O(lg N): Segment tree and Binary Indexed Tree (BIT / Fenwick Tree).
Most of the examples I have found is about some associative and commutative operation like "Sum of integers in a range", "XOR integers in a range", etc.
I wonder if these two data structures (or any other data structures / algorithm, please propose) can achieve the below query in O(lg N)? (If no, how about O(sqrt N))
Given an array of integer A, query the number of distinct integer in a range [l,r]
PS: Assuming the number of available integer is ~ 10^5, so used[color] = true or bitmask is not possible
For example: A = [1,2,3,2,4,3,1], query([2,5]) = 3, where the range index is 0-based.
Yes, this is possible to do in O(log n), even if you should answer queries online. However, this requires some rather complex techniques.
First, let's solve the following problem: given an array, answer the queries of form "how many numbers <= x are there within indices [l, r]". This is done with a segment-tree-like structure which is sometimes called Merge Sort Tree. It is basically a segment tree where each node stores a sorted subarray. This structure requires O(n log n) memory (because there are log n layers and each of them requires storing n numbers). It is built in O(n log n) as well: you just go bottom-up and for each inner vertex merge sorted lists of its children.
Here is an example. Say 1 5 2 6 8 4 7 1 be an original array.
|1 1 2 4 5 6 7 8|
|1 2 5 6|1 4 7 8|
|1 5|2 6|4 8|1 7|
|1|5|2|6|8|4|7|1|
Now you can answer for those queries in O(log^2 n time): just make a reqular query to a segment tree (traversing O(log n) nodes) and make a binary search to know how many numbers <= x are there in that node (additional O(log n) from here).
This can be speed up to O(log n) using Fractional Cascading technique, which basically allows you to do the binary search not in each node but only in the root. However it is complex enough to be described in the post.
Now we return to the original problem. Assume you have an array a_1, ..., a_n. Build another array b_1, ..., b_n, where b_i = index of the next occurrence of a_i in the array, or ∞ if it is the last occurrence.
Example (1-indexed):
a = 1 3 1 2 2 1 4 1
b = 3 ∞ 6 5 ∞ 8 ∞ ∞
Now let's count numbers in [l, r]. For each unique number we'll count its last occurrence in the segment. With b_i notion you can see that the occurrence of the number is last if and only if b_i > r. So the problem boils down to "how many numbers > r are there in the segment [l, r]" which is trivially reduced to what I described above.
Hope it helps.
If you're willing to answer queries offline, then plain old Segment Trees/ BIT can still help.
Sort queries based on r values.
Make a Segment Tree for range sum queries [0, n]
For each value in input array from left to right:
Increment by 1 at current index i in the segment tree.
For current element, if it's been seen before, decrement by 1 in
segment tree at it's previous position.
Answer queries ending at current index i, by querying for sum in range [l, r == i].
The idea in short is to keep marking rightward indexes, the latest occurrence of each individual element, and setting previous occurrences back to 0. The sum of range would give the count of unique elements.
Overall time complexity again would be nLogn.
There is a well-known offline method to solve this problem. If you have n size array and q queries on it and in each query, you need to know the count of distinct number in that range then you can solve this whole thing in O(n log n + q log n) time complexity. Which is similar to solve every query in O(log n) time.
Let's solve the problem using the RSQ( Range sum query) technique. For the RSQ technique, you can use a segment tree or BIT. Let's discuss the segment tree technique.
For solving this problem you need an offline technique and a segment tree. Now, what is an offline technique?? The offline technique is doing something offline. In problem-solving an example of the offline technique is, You take input all queries first and then reorder them is a way so that you can answer them correctly and easily and finally output the answers in the given input order.
Solution Idea:
First, take input for a test case and store the given n numbers in an array. Let the array name is array[] and take input q queries and store them in a vector v. where every element of v hold three field- l, r, idx. where l is the start point of a query and r is the endpoint of a query and idx is the number of queries. like this one is n^th query.
Now sort the vector v on the basis of the endpoint of a query.
Let we have a segment tree which can store the information of at least 10^5 element. and we also have an areay called last[100005]. which stores the last position of a number in the array[].
Initially, all elements of the tree are zero and all elements of the last are -1.
now run a loop on the array[]. now inside the loop, you have to check this thing for every index of array[].
last[array[i]] is -1 or not? if it is -1 then write last[array[i]]=i and call update() function of which will add +1 in the last[array[i]] th position of segment tree.
if last[array[i]] is not -1 then call update() function of segment tree which will subtract 1 or add -1 in the last[array[i]] th position of segment tree. Now you need to store current position as last position for future. so that you need to write last[array[i]]=i and call update() function which will add +1 in the last[array[i]] th position of segment tree.
Now you have to check whether a query is finished in the current index. that is if(v[current].r==i). if this is true then call query() function of segment tree which will return and sum of the range v[current].l to v[current].r and store the result in the v[current].idx^th index of the answer[] array. you also need to increment the value of current by 1.
6. Now print the answer[] array which contains your final answer in the given input order.
the complexity of the algorithm is O(n log n).
The given problem can also be solved using Mo's (offline) algorithm also called Square Root decomposition algorithm.
Overall time complexity is O(N*SQRT(N)).
Refer mos-algorithm for detailed explanation, it even has complexity analysis and a SPOJ problem that can be solved with this approach.
kd-trees provide range queries in O(logn), where n is the number of points.
If you want faster query than a kd-tree, and you are willing to pay the memory cost, then Range trees are your friends, offering a query of:
O(logdn + k)
where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.
Bentley is an important name when it comes to this field. :)

What can be an efficient data structure for this?

The problem link is: http://www.spoj.com/problems/ORDERSET/en/
In this problem, you have to maintain a dynamic set of numbers which support the two fundamental operations
INSERT(S,x): if x is not in S, insert x into S
DELETE(S,x): if x is in S, delete x from S
and the two type of queries
K-TH(S) : return the k-th smallest element of S
COUNT(S,x): return the number of elements of S smaller than x
Input
Line 1: Q (1 ≤ Q ≤ 200000), the number of operations
In the next Q lines, the first token of each line is a character I, D, K or C meaning that the corresponding operation is INSERT, DELETE, K-TH or COUNT, respectively, following by a whitespace and an integer which is the parameter for that operation.
If the parameter is a value x, it is guaranteed that 0 ≤ |x| ≤ 109. If the parameter is an index k, it is guaranteed that 1 ≤ k ≤ 109.
Output
For each query, print the corresponding result in a single line. In particular, for the queries K-TH, if k is larger than the number of elements in S, print the word 'invalid'.
I thought of using a set here, since the insertion and deletion in a set can be done in logarithmic time. However, I am not sure if set is the ideal data structure for finding the k'th element and number of elements less than it. What other DS can I use to make it optimal. Thanks!
A set is an abstract data type, not a data structure. An hash table is a data structure that can be used to implements a set, as is a binary tree.
Speaking of binary tree, it seems well suited for your needs here.
If balanced : fast INSERT and DELETE
If you keep track of subnodes count : K-TH and COUNT should be very fast too

Fast Queries over subarrays

Problem: Given an sorted array of integers a[N], I have to process queries of kind as follow
[L R] p : find sum of aiCp for all i=L...R
Constraints:
N< 105
1<=ai<=106
Suppose there are Q such queries , Then please suggest a better method to solve this problem. Some points to note are
All queries are given in advance , i.e. offline algorithm can work .
Also note that the array is sorted.
each element in the array is bounded by small number.
There are no updates to the array.
Thanks
PS: brute force approach is processing each query element by element which would give complexity: O(Q * N * (cost of n choose r) ) .

Data structure that supports range based most frequently occuring element query

I'm looking for a data structure with which I can find the most frequently occuring number (among an array of numbers) in a given, variable range.
Let's consider the following 1 based array:
1 2 3 1 1 3 3 3 3 1 1 1 1
If I query the range (1,4), the data structure must retun 1, which occurs twice.
Several other examples:
(1,13) = 1
(4,9) = 3
(2,2) = 2
(1,3) = 1 (all of 1,2,3 occur once, so return the first/smallest one. not so important at the moment)
I have searched, but could not find anything similar. I'm looking (ideally) a data structure with minimal space requirement, fast preprocessing, and/or query complexities.
Thanks in advance!
Let N be the size of the array and M the number of different values in that array.
I'm considering two complexities : pre-processing and querying an interval of size n, each must be spacial and temporal.
Solution 1 :
Spacial : O(1) and O(M)
Temporal : O(1) and O(n + M)
No pre-processing, we look at all values of the interval and find the most frequent one.
Solution 2 :
Spacial : O(M*N) and O(1)
Temporal : O(M*N) and O(min(n,M))
For each position of the array, we have an accumulative array that gives us for each value x, how many times x is in the array before that position.
Given an interval we just need for each x to subtract 2 values to find the number of x in that interval. We iterate over each x and find the maximum value. If n < M we iterate over each value of the interval, otherwise we iterate over all possible values for x.
Solution 3 :
Spacial : O(N) and O(1)
Temporal : O(N) and O(min(n,M)*log(n))
For each value x build a binary heap of all the position in the array where x is present. The key in your heap is the position but you also store the total number of x between this position and the begin of the array.
Given an interval we just need for each x to subtract 2 values to find the number of x in that interval : in O(log(N)) we can ask the x's heap to find the two positions just before the start/end of the interval and substract the numbers. Basically it needs less space than a histogram but the query in now in O(log(N)).
You could create a binary partition tree where each node represents a histogram map of {value -> frequency} for a given range, and has two child nodes which represent the upper half and lower half of the range.
Querying is then just a case of recursively adding together a small number of these histograms to cover the range required, and scanning the resulting histogram once to find the highest occurrence count.
Useful optimizations include:
Using a histogram with mutable frequency counts as an "accumulator" while you add histograms together
Stop using precomputed histograms once you get down to a certain size (maybe a range less than the total number of possible values M) and just counting the numbers directly. It's a time/space trade-off that I think will pay off a lot of the time.
If you have a fixed small number of possible values, use an array rather than a map to store the frequency counts at each node
UPDATE: my thinking on algorithmic complexity assuming a bounded small number of possible values M and a total of N values in the complete range:
Preprocessing is O(N log N) - basically you need to traverse the complete list and build a binary tree, building one node for every M elements in order to amortise the overhead of each node
Querying is O(M log N) - basically adding up O(log N) histograms each of size M, plus counting O(M) values on either side of the range
Space requirement is O(N) - approx. 2N/M histograms each of size M. The 2 factor is the sum from having N/M histograms at the bottom level, 0.5N/M histograms at the next level, 0.25N/M at the third level etc...

Resources