Fast Queries over subarrays - algorithm

Problem: Given an sorted array of integers a[N], I have to process queries of kind as follow
[L R] p : find sum of aiCp for all i=L...R
Constraints:
N< 105
1<=ai<=106
Suppose there are Q such queries , Then please suggest a better method to solve this problem. Some points to note are
All queries are given in advance , i.e. offline algorithm can work .
Also note that the array is sorted.
each element in the array is bounded by small number.
There are no updates to the array.
Thanks
PS: brute force approach is processing each query element by element which would give complexity: O(Q * N * (cost of n choose r) ) .

Related

Find all consecutive sum of subset in O(n^2) using dp

Question :
Given n numbers š’™šŸ,š’™šŸ,ā€¦,š’™š’, consider the problem of computing š’…[š’Š,š’‹]=š’™š’Š+š’™š’Š+šŸā‹Æš’™š’‹, for all š’Š<=š’‹. A naive algorithm by computing each š’…[š’Š,š’‹] independently will take ššÆ(š’^šŸ‘) time. Derive an efficient way to solve this problem in šŽ(š’^šŸ) time.
I tried to draw a 2-dimension table which row and column are both 1~n, and find some formula to fill all table(upper triangle). But I think each block is irregular, maybe this is not good idea. Are there any idea? Thanks.
Yes, your idea to use a 2-dimenstion table (only the upper triangle) is correct.
Then, you only need to notice that:
and that
Each input of your table will hence be computed in O(1), and the whole upper triangle in O(n^2)
There is no need for O(n^2) memory though. You can use what is called prefix sums. Create an array 'prefsum[n]', where for each i in (1 ... n) prefsum[i] = x1 + x2 + ... + xi. If you want to get sum in range (l, r), just take prefsum[r] - prefsum[l - 1] (obviously if l-1 > 0, otherwise your result is prefsum[r]). This way you can calculate prefsum in O(n) (simple for loop) and get result for any range in O(1). Since there are O(n^2) different ranges, your complexity is O(n^2).

Is it possible to query number of distinct integers in a range in O(lg N)?

I have read through some tutorials about two common data structure which can achieve range update and query in O(lg N): Segment tree and Binary Indexed Tree (BIT / Fenwick Tree).
Most of the examples I have found is about some associative and commutative operation like "Sum of integers in a range", "XOR integers in a range", etc.
I wonder if these two data structures (or any other data structures / algorithm, please propose) can achieve the below query in O(lg N)? (If no, how about O(sqrt N))
Given an array of integer A, query the number of distinct integer in a range [l,r]
PS: Assuming the number of available integer is ~ 10^5, so used[color] = true or bitmask is not possible
For example: A = [1,2,3,2,4,3,1], query([2,5]) = 3, where the range index is 0-based.
Yes, this is possible to do in O(log n), even if you should answer queries online. However, this requires some rather complex techniques.
First, let's solve the following problem: given an array, answer the queries of form "how many numbers <= x are there within indices [l, r]". This is done with a segment-tree-like structure which is sometimes called Merge Sort Tree. It is basically a segment tree where each node stores a sorted subarray. This structure requires O(n log n) memory (because there are log n layers and each of them requires storing n numbers). It is built in O(n log n) as well: you just go bottom-up and for each inner vertex merge sorted lists of its children.
Here is an example. Say 1 5 2 6 8 4 7 1 be an original array.
|1 1 2 4 5 6 7 8|
|1 2 5 6|1 4 7 8|
|1 5|2 6|4 8|1 7|
|1|5|2|6|8|4|7|1|
Now you can answer for those queries in O(log^2 n time): just make a reqular query to a segment tree (traversing O(log n) nodes) and make a binary search to know how many numbers <= x are there in that node (additional O(log n) from here).
This can be speed up to O(log n) using Fractional Cascading technique, which basically allows you to do the binary search not in each node but only in the root. However it is complex enough to be described in the post.
Now we return to the original problem. Assume you have an array a_1, ..., a_n. Build another array b_1, ..., b_n, where b_i = index of the next occurrence of a_i in the array, or āˆž if it is the last occurrence.
Example (1-indexed):
a = 1 3 1 2 2 1 4 1
b = 3 āˆž 6 5 āˆž 8 āˆž āˆž
Now let's count numbers in [l, r]. For each unique number we'll count its last occurrence in the segment. With b_i notion you can see that the occurrence of the number is last if and only if b_i > r. So the problem boils down to "how many numbers > r are there in the segment [l, r]" which is trivially reduced to what I described above.
Hope it helps.
If you're willing to answer queries offline, then plain old Segment Trees/ BIT can still help.
Sort queries based on r values.
Make a Segment Tree for range sum queries [0, n]
For each value in input array from left to right:
Increment by 1 at current index i in the segment tree.
For current element, if it's been seen before, decrement by 1 in
segment tree at it's previous position.
Answer queries ending at current index i, by querying for sum in range [l, r == i].
The idea in short is to keep marking rightward indexes, the latest occurrence of each individual element, and setting previous occurrences back to 0. The sum of range would give the count of unique elements.
Overall time complexity again would be nLogn.
There is a well-known offline method to solve this problem. If you have n size array and q queries on it and in each query, you need to know the count of distinct number in that range then you can solve this whole thing in O(n log n + q log n) time complexity. Which is similar to solve every query in O(log n) time.
Let's solve the problem using the RSQ( Range sum query) technique. For the RSQ technique, you can use a segment tree or BIT. Let's discuss the segment tree technique.
For solving this problem you need an offline technique and a segment tree. Now, what is an offline technique?? The offline technique is doing something offline. In problem-solving an example of the offline technique is, You take input all queries first and then reorder them is a way so that you can answer them correctly and easily and finally output the answers in the given input order.
Solution Idea:
First, take input for a test case and store the given n numbers in an array. Let the array name is array[] and take input q queries and store them in a vector v. where every element of v hold three field- l, r, idx. where l is the start point of a query and r is the endpoint of a query and idx is the number of queries. like this one is n^th query.
Now sort the vector v on the basis of the endpoint of a query.
Let we have a segment tree which can store the information of at least 10^5 element. and we also have an areay called last[100005]. which stores the last position of a number in the array[].
Initially, all elements of the tree are zero and all elements of the last are -1.
now run a loop on the array[]. now inside the loop, you have to check this thing for every index of array[].
last[array[i]] is -1 or not? if it is -1 then write last[array[i]]=i and call update() function of which will add +1 in the last[array[i]] th position of segment tree.
if last[array[i]] is not -1 then call update() function of segment tree which will subtract 1 or add -1 in the last[array[i]] th position of segment tree. Now you need to store current position as last position for future. so that you need to write last[array[i]]=i and call update() function which will add +1 in the last[array[i]] th position of segment tree.
Now you have to check whether a query is finished in the current index. that is if(v[current].r==i). if this is true then call query() function of segment tree which will return and sum of the range v[current].l to v[current].r and store the result in the v[current].idx^th index of the answer[] array. you also need to increment the value of current by 1.
6. Now print the answer[] array which contains your final answer in the given input order.
the complexity of the algorithm is O(n log n).
The given problem can also be solved using Mo's (offline) algorithm also called Square Root decomposition algorithm.
Overall time complexity is O(N*SQRT(N)).
Refer mos-algorithm for detailed explanation, it even has complexity analysis and a SPOJ problem that can be solved with this approach.
kd-trees provide range queries in O(logn), where n is the number of points.
If you want faster query than a kd-tree, and you are willing to pay the memory cost, then Range trees are your friends, offering a query of:
O(logdn + k)
where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.
Bentley is an important name when it comes to this field. :)

Greedy Attempt for covering all the numbers with the given intervals

Let S be a set of intervals (containing n number of intervals) of the natural numbers that might overlap and N be a list of numbers (containing n number of numbers).
I want to find the smallest subset (let's call P) of S such that for each number
in our list N, there exists at least one interval in P that contains it. The intervals in P are allowed to overlap.
Trivial example:
S = {[1..4], [2..7], [3..5], [8..15], [9..13]}
N = [1, 4, 5]
// so P = {[1..4], [2..7]}
I think a dynamic algorithm might not work always, so if anybody knows of a solution to this problem (or a similar one that can be converted into), that would be great. I am trying to make a O(n^2 solution)
Here is one greedy approach
P = {}
for each q in N: // O(n)
if q in P // O(n)
continue
for each i in S // O(n)
if q in I: // O(n)
P.add(i)
break
But that is O(n^4).. Any help with creating a greedy approach that is O(n^2) would be great!
Thanks!
* Update: * I've been slamming at this problem and I think I have an O(n^2) solution!!
Let me know if you think I'm right!!!
N = MergeSort (N)
upper, lower = infinity, -1
P = empty set
for each q in N do
if (q>=lower and q<=upper)=False
max_interval = [-infinity, infinity]
for each r in S do
if q in r then
if r.rightEndPoint > max_interval.rightEndPoint
max_interval = r
P.append(max_interval)
lower = max_interval.leftEndPoint
upper = max_interval.rightEndPoint
S.remove(max_interval)
I think this should work!! I'm trying to find a counter solution; but yeah!!
This problem is similar to set cover problem, which is NP-complete (i.e., arguably has no solution faster than exponential). What makes it different is that intervals always cover adjacent elements (not arbitrary subset of N), which opens ways for faster solutions.
http://en.wikipedia.org/wiki/Set_cover_problem
I think that the solution proposed by Mike is good enough. But I think I have quite straightforward O(N^2) greedy algo. It starts like the Mike's one (moreover, I believe Mike's solution can also be improved in similar way):
You sort your N numbers and place them sorted into array ELEM; COMPLEXITY O(N*lg N);
Using binary search, for each interval S[i] you identify starting and ending index of elements in ELEM that are covered by S[i]. Say, you place this pair of numbers into array COVER, the difference between the two indices tells you how many elements you cover, for simplicity, let us place it array COVER_COUNT; COMPLEXITY O(N*lg N);
You introduce index pointer p, that shows till which element in ELEM, your N is already covered. you set p = 0, meaning that all elements up to 0-th (excluded) are initially covered (i.e., no elements); Complexity O(1). Moreover you introduce boolean array IS_INCLUDED, that reflects if interval S[i] is already included in your coverage set. Complexity O(N)
Then you start from the 0-th element in ELEM and see what is the interval that contains ELEM[0] and has greater coverage COVER_COUNT[i]. Imagine that it is i-th interval. We then mark it as included by setting IS_INCLUDED[i] to true. Then you set p to end[i] + 1 where end[i] is the ending index in COVER[i] pair (indeed now all elements til end[i] are covered). Then, knowing p you update all elements in COVER_COUNT so that they reflect how many elements of not yet covered elements each interval covers (this can be easily done in O(N) time). Then you perform the same step for ELEM[p] and continues till p >= ELEM.length. It can be observed that the overall complexity is O(N^2).
You finish in O(n^2) and in IS_INCLUDED has true for intervals of S included in optimal cover set
Let me know if this solution seems reasonable to you and if I calculated everything well.
P.S. Just wanted to add that the optimality of ythe solution found by algo can be proved by induction and contradiction. By contradiction, it is easy to show that at least one optimal solution includes the longest interval of those covering element ELEM[0]. If so, by induction we can show that for each next element in algo, we can keep on following the strategy of selelcting the interval that is the longest with respect to the number of remaining elements covered and that covers the leftmost yet uncovered element.
I am not sure, but mb some think like this.
1) For each interval create a list with elements from N witch contain in interval, it will take O(n^2) lets call it Q[i] for S[i]
2) Then sort our S by length of Q[i], O(n*lg(n))
3) Go throw this array excluding Q[i] from N O(n) and from Q[i+1]...Q[n] = O(n^2)
4) Repeat 2 while N is not empty.
It's not O(n^2), it's O(n^3) but if you can use hashmap, i think you can improve this.

How do you find multiple ki smallest elements in array?

I am struggling with my homework and need a little push- the question is to design an algorithm that will in O(nlogm) time find multiple smallest elements 1<k1<k2<...<kn and you have m *k. I know that a simple selection algorithm takes o(n) time to find the kth element, but how do you reduce the m in your recurrence? I though to do both k1 and kn in each run, but that will only take out 2, not m/2.
Would appreciate some directions.
Thanks
If I understand the question correctly, you have a vector K containing m indices, and you want to find the k'th ranked element of A for each k in K. If K contains the smallest m indices (i.e. k=1,2,...,m) then this can be done easily in linear time T=O(n) by using quickselect to find the element k_m (since all the smaller elements will be on the left at the end of quickselect). So I'm assuming that K can contain any set of m indices.
One way to accomplish this is by running quickselect on all of K at the same time. Here is the algorithm
QuickselectMulti(A,K)
If K is empty, then return an empty result set
Pick a pivot p from A at random
Partition A into sets A0<p and A1>p.
i = A0.size + 1
if K contains i, then remove i from K and add (i=>p) to the result set.
Partition K into sets K0<i and K1>i
add QuickselectMulti(A0,K0) to the result set
subtract i from each k in K1
call QuickselectMulti(A1,K1), add i to each index of the output, and add this to the result set
return the result set
If K contains just one element, this is the same as randomized quickselect. To see why the running time is O(n log m) on average, first consider what happens when each pivot exactly splits both A and K in half. In this case, you get two recursive calls, so you have
T = n + 2T(n/2,m/2)
= n + n + 4T(n/4,m/4)
= n + n + n + 8T(n/8,m/8)
Since m drops in half each time, then n will show up log m times in this summation. To actually derive the expected running time requires a little more work, because you can't assume that the pivot will split both arrays in half, but if you work through the calculations, you will see that the running time is in fact O(n log m) on average.
On edit: The analysis of this algorithm can make this simpler by choosing the pivot by running p=Quickselect(A,k_i) where k_i is the middle element of K, rather than choosing p at random. This will guarantee that K gets split in half each time, and so the number of recursive calls will be exactly log m, and since quickselect runs in linear time, the result will still be O(n log m).

What is the order of the run time for an algorithm with this desired output?

There are N sets Ai to An each with string entries. The average size of a set is K.
For each Ai we wish to return a list (or a better data structure?) of N-1 sets excluding Ai ordered by how many elements the sets have in common with Ai?
Please don't be shy to give a detailed response with nice mathematical arguments...:)
Also is this a standard problem and can I read more about it somewhere?
Basicly you generate each result list element by performing an intersections of 2 sets. You have N-1 intersections in your result list element, that boils down to N-1 * IntersectTime. For N list elements in the result this sums up to N(N-1) * IntersectTime. Afterwards you have to order N times N-1 sets, so just for ordering them you have O(NĀ² log N).
IntersectTime depends on the implementation of the set, for a typical hashset this is for you O(k).
So finally we have O(NĀ²k) + O(NĀ² log N) = O(NĀ² (k+log N)) = (if we assume k > log N) O(NĀ²k).
EDIT: when you would really implemnt it, it is good to know that when you intersect two sets, you can use the result for 2 of the result list elements, that means, that for the first you have to intersect A_1 with N-1, for A_2 with N-2 (intersection with A_1 was already done at for first element), for A_3 with N-3 other sets and finally for A_N with none. BUT this does not modify the big-O time, it just halfs the runtime.
Here's my attempt -
I believe you can boil the process down into:
O(N * (C + S))
Where N is the number of sets, C is the amount of time it takes to compare N-1 sets to set Ai, and S is the amount of time it takes to sort the N-1 sets.
The comparison is K items to K items N-1 times, so (N-1)K^2 time to compare
Sorting should take log(n - 1) time with an efficient algorithm
For simplicity, we can shorten N-1 into just N
So, the whole thing should run in O(N(NK^2 + log(N)))
You should take this with a grain of salt, I haven't done anything with algorithms for quite a while. There may also be a more efficient way to compare the sets.

Resources