How do I calculate mod sum efficiently with query and updates - data-structures

Given an array A(say A[1..n]={1,2,3})
now I want two queries:
1) Update(idx,val) : update value of A[idx]=val
2) int Query(MOD).....
int Query(int MOD) : // say MOD =2 , so 1%2 +2%2 + 3%2 =2
int ans=0;
for i=1 to i=n
ans+=(A[i]%MOD);
return ans;
I thought of Fenwick Tree with indices as all possible values of MOD but the problem is I don't have constant updates because A[i]%MOD can be different for each A[i]? How Do I Do it efficiently?

Related

Number of occurrences of each distinct integer in given ranges for an array

Given an array of n integers (n <= 1e6) [a0, a1, a2, ... an-1] (a[i] <= 1e9) and multiple queries. In each query 2 integers l and r (0 <= l <= r <= n-1) are given and we need to return the count of each distinct integer inside this range (l and r inclusive).
I can only come up with a brute force solution to iterate through the complete range for each query.
d={}
for i in range(l, r+1):
if arr[i] not in d:
d[arr[i]]=0
d[arr[i]]+=1
For example:
Array is [1, 1, 2, 3, 1, 2, 1]
Query 1: l=0, r=6, Output: 4, 2, 3 (4 for 4 1's, 2, for 2 2's and 1 for 1 3)
Query 2: l=3, r=5, Output: 1, 1, 1
Edit- I came up with something like this but still its complexity is pretty high. I think because of that insert operation.
const ll N = 1e6+5;
ll arr[N];
unordered_map< ll, ll > tree[4 * N];
int n, q;
void build (ll node = 1, ll start = 1, ll end = n) {
if (start == end) {
tree[node][arr[start]] = 1;
return;
}
ll mid = (start + end) / 2;
build (2 * node, start, mid);
build (2 * node + 1, mid + 1, end);
for (auto& p : tree[2 * node]) {
ll x = p.ff;
ll y = p.ss;
tree[node][x] += y;
}
for (auto& p : tree[2 * node + 1]) {
ll x = p.ff;
ll y = p.ss;
tree[node][x] += y;
}
}
vector< ll > query (ll node, ll l, ll r, ll start = 1, ll end = n) {
vector< ll > ans;
if (end < l or start > r) return ans;
if (start >= l and end <= r) {
for (auto p : tree[node]) {
ans.push_back (p.ss);
}
return ans;
}
ll mid = (start + end) / 2;
vector< ll > b = query (2 * node, l, r, start, mid);
ans.insert (ans.end (), b.begin (), b.end ());
b = query (2 * node + 1, l, r, mid + 1, end);
ans.insert (ans.end (), b.begin (), b.end ());
return ans;
}
You can use a binary index tree as described here. Rather than storing range sums in the nodes, store maps from values to counts for the respective ranges.
Now query the tree with input x to find a map for representing the frequencies of occurrence of each element in the corresponding index prefix [1..i]. This will require merging O(log n) maps.
Now you can do two queries: one for l-1 and another for r. "Subtract" the former result map from the latter. The map subtraction is entry-wise. I'll let you work out the details.`
The time for each query will be O(k log n) where k is the map size. This will be at most the number of distinct elements in the input array.
It sounds like this might be a candidate for how we arrange the queries. Assuming both the number of queries and length of input are on the order of n, similarly to this post, we can bucket them according to floor(l / sqrt(n)) and sort each bucket by r. Now we have sqrt(n) buckets.
Each bucket's q queries will have at most O(q * sqrt(n)) changes due to each movement in l and at most O(n) changes due to the gradual change in r (since we sorted each bucket by r, that side of the interval only increases steadily as we process the bucket).
Processing the changes on the right side of all the intervals in one bucket is bound at O(n) and we have sqrt(n) buckets so that's O(n * sqrt(n) for the right side. And since the number of all qs is O(n) (assumed) and each one requires at most O(sqrt(n)) changes on the left side, the changes for the left side are also O(n * sqrt(n)).
Total time complexity would therefore be O(n * sqrt(n) + k), where k is the total numbers output. (The updated data structure could be a hashmap that also allows for iteration on its current store.)
You can use hash map. iterate from l to r and store each elements as key and occurrence as count.It will take O(n) to specify number of distinct element count in given range. You have to check for element already exists or not in the hash map every time you insert an element into the hash map. If element already exists then update the count else keep count as 1.

Sum of remainders over the entire array for several queries

I am looking at this challenge:
You are provided an array A[ ] of N elements.
Also, you have to answer M queries.
Each query is of following type-
Given a value X, find A[1]%X + A[2]%X + ...... + A[N]%X
1<=N<=100000
1<=M<=100000
1<=X<=100000
1<=elements of array<=100000
I am having a problem in computing this value in an optimized way.
How can we compute this value for different X?
Here is a way that you could at least reduce the multiplicative factor in the time complexity.
In the C standard, the modulo (or remainder) is defined to be a % b = a - (a / b) * b (where / is integer division).
A naive, iterative way (possibly useful on embedded systems with no division unit) to compute the modulo is therefore (pseudo-code):
function remainder (A, B):
rem = A
while rem > B:
rem -= B;
return rem
But how does this help us at all? Suppose we:
Sort the array A[i] in ascending order
Pre-compute the sum of all elements in A[] -> S
Find the first element (with index I) greater than X
From the pseudocode above it is clear that at least (one multiple of) X must be subtracted from all elements in the array from index I onwards. Therefore we must subtract (N - I + 1) * X from the sum S.
Even better: we can keep a variable (call it K, initialize to zero) which is equal to the total multiple of X we must subtract from S to find the sum of all remainders. Thus at this stage we could simply add N - I + 1 to K.
Repeat the above, finding the first element greater than the next limit L = 2X, 3X, ... and so on, until we have passed the end of the array.
Finally, the result is given by S - K * X.
Pseudocode:
function findSumOfRemainder (A[N], X):
sort A
S = sum A
K = 0
L = X
I = 0
while I < N:
I = lowest index such that A[I] >= L
K += N - I + 1
L += X
return S - K * X
What is the best way to find I at each stage, and how does it relate to the time-complexity?
Binary search: Since the entire array is sorted, to find the first index I at which A[I] >= L, we can just do a binary search on the array (or succeeding sub-array at each stage of the iteration, bounded by [I, N - 1]). This has complexity O( log[N - I + 1] ).
Linear search: Self-explanatory - increment I until A[I] >= L, taking O( N - I + 1 )
You may dismiss the linear search method as being "stupid" - but let's look at the two different extreme cases. For simplicity we can assume that the values of A are "uniformly" distributed.
(max(A) / X) ~ N: We will have to compute very few values of I; binary search is the preferred method here because the complexity would be bounded by O([NX / max(A)] * log[N]), which is much better than that of linear search O(N).
(max(A) / X) << N: We will have to compute many values of I, each separated by only a few indices. In this case the total binary search complexity would be bounded by O(log N) + O(log[N-1]) + O(log[N-2]) + ... ~ O(N log N), which is significantly worse than that of linear search.
So which one do we choose? Well this is where I must get off, because I don't know what the optimal answer would be (if there even is one). But the best I can say is to set some threshold value for the ratio max(A) / X - if greater then choose binary search, else linear.
I welcome any comments on the above + possible improvements; the range constraint of the values may allow better methods for finding values of I (e.g. radix sort?).
#include<bits/stdc++.h>
using namespace std;
int main(){
int t;
cin >> t;
while(t--){
int n;
cin >> n;
int arr[n];
long long int sum = 0;
for(int i=0;i<n;i++){
cin >> arr[i];
}
cout << accumulate(arr, arr+n, sum) - n << '\n';
}
}
In case you don't know about accumulate refer this.

Smallest missing integer algorithm that runs in O(n)?

What algorithm might find a missing integer in O(n) time, from an array?
Say we have an array A with elements in a value-range {1,2,3...2n}. Half the elements are missing so length of A = n.
E.g:
A = [1,2,5,3,10] , n=5
Output = 4
The smallest missing integer must be in the range [1, ..., n+1]. So create an array of flags, all initially false, indicating the presence of that integer. Then an algorithm is:
Scan the input array, setting flags to true as you encounter values in the range. This operation is O(n). (That is, set flag[A[i]] to true for each position i in the input array, provided A[i] <= n.)
Scan the flag array for the first false flag. This operation is also O(n). The index of the first false flag is the smallest missing integer.
EDIT: O(n) time algorithm with O(1) extra space:
If A is writable and there are some extra bits available in the elements of A, then a constant-extra-space algorithm is possible. For instance, if the elements of A are signed values, and since all the numbers are positive, we can use the sign bit of the numbers in the original array as the flags, rather than creating a new flag array. So the algorithm would be:
For each position i of the original array, if abs(A[i]) < n+1, make the value at A[abs(A[i])] negative. (This assumes array indexes are based at 1. Adjust in the obvious way if you are using 0-based arrays.) Don't just negate the value, in case there are duplicate values in A.
Find the index of the first element of A that is positive. That index is the smallest missing number in A. If all positions are negative, then A must be a permutation of {1, ..., n} and hence the smallest missing number is n+1.
If the elements are unsigned, but can hold values as high as 4 n + 1, then in step 1, instead of making the element negative, add 2 n + 1 (provided the element is <= 2 n) and use (A[i] mod (2n+1)) instead of abs(A[i]). Then in step 2, find the first element < 2 n + 1 instead of the first positive element. Other such tricks are possible as well.
You can do this in O(1) additional space, assuming that the only valid operations on the array is to read elements, and to swap pairs of elements.
First note that the specification of the problem excludes the possibility of the array containing duplicates: it contains half of the numbers from 1 to 2N.
We perform a quick-select type algorithm. Start with m=1, M=2N+1, and pivot the array on (m + M)/2. If the size of the left part of the array (elements <= (m+M)/2) is less than (m + M)/2 - m + 1, then the first missing number must be there. Otherwise, it must be in the right part of the array. Repeat on the left or right side accordingly until you find the missing number.
The size of the slice of the array under consideration halves each time and pivoting an array of size n can be done in O(n) time and O(1) space. So overall, the time complexity is 2N + N + N/2 + ... + 1 <= 4N = O(N).
An implementation of Paul Hankin's idea in C++
#include <iostream>
using namespace std;
const int MAX = 1000;
int a[MAX];
int n;
void swap(int &a, int &b) {
int tmp = a;
a = b;
b = tmp;
}
// Rearranges elements of a[l..r] in such a way that first come elements
// lower or equal to M, next come elements greater than M. Elements in each group
// come in no particular order.
// Returns an index of the first element among a[l..r] which is greater than M.
int rearrange(int l, int r, int M) {
int i = l, j = r;
while (i <= j)
if (a[i] <= M) i++;
else swap(a[i], a[j--]);
return i;
}
int main() {
cin >> n;
for (int i = 0; i < n; i++) cin >> a[i];
int L = 1, R = 2 * n;
int l = 0, r = n - 1;
while (L < R) {
int M = (L + R) / 2; // pivot element
int m = rearrange(l, r, M);
if (m - l == M - L + 1)
l = m, L = M + 1;
else
r = m - 1, R = M;
}
cout << L;
return 0;
}

Can this problem be solved by dynamic programming?

Given n, m, d. The answer is stored in sum variable in the below code:
int x = m / d;
int sum = 0;
for (int i = 1; i <= x; i++) {
sum += mobius(i) * ((x / i) ^ n);
}
Now the problem is to find the total sum % (10^9 + 7) when d varies from [l, r] with n, m as mentioned above. I have only been able to do it by brute-force, but the constraints are 1 <= n, m, l, r <= 10^7. So the brute-force solution cannot pass the time limit.
Is there some underlying overlapping subproblem and optimal substructure property to this problem which can be used to solve the problem by dynamic programming?
Link: Mobius Function, I have pre-calculated the mobius function in O(nlogn).
Edit: Given t, n, m. Where t is the number of test cases,
l, r is given t times. We have to output the total sum as mentioned above.
Sample Input:
T : 2
N : 3, M : 10
Values of l and r
9 9
10 10
Sample Output:
1
1
Note that when you divide m by d to compute x, there will only be about 2*sqrt(m) unique values for x.
This means you only need to trigger the second loop for each unique value of x.
Similarly, in the computation of x/i, there will only be about 2*sqrt(x) unique values for (x/i). This means you only need to compute (x/i)^n for each unique value.
For each unique value of x/i there will be a range of i values that produce this value.
You will then need to add up mobius[i] for all the values of i that produce the same output. This can be done by preparing an array with the cumulative sum of the Mobius function (this cumulative sum is called the Mertens function).
For example, if
M[k] = sum[ Mobius(i) for i = 1..k ]
then
sum[ Mobius(i) for i = low..high ] = M[high] - M[low-1]
Overall the complexity is O( sqrt(n) * sqrt(n) ) = O(n) (in addition to the time spent computing the Mobius function).

Count number of subsets with sum equal to k

Given an array we need to find out the count of number of subsets having sum exactly equal to a given integer k.
Please suggest an optimal algorithm for this problem. Here the actual subsets are not needed just the count will do.
The array consists of integers which can be negative as well as non negative.
Example:
Array -> {1,4,-1,10,5} abs sum->9
Answer should be 2 for{4,5} and {-1,10}
This is a variation of the subset sum problem, which is NP-Hard - so there is no known polynomial solution to it. (In fact, the subset sum problem says it is hard to find if there is even one subset that sums to the given sum).
Possible approaches to solve it are brute force (check all possible subsets), or if the set contains relatively small integers, you can use the pseudo-polynomial dynamic programming technique:
f(i,0) = 1 (i >= 0) //succesful base clause
f(0,j) = 0 (j != 0) //non succesful base clause
f(i,j) = f(i-1,j) + f(i-1,j-arr[i]) //step
Applying dynamic programming to the above recursive formula gives you O(k*n) time and space solution.
Invoke with f(n,k) [assuming 1 based index for arrays].
Following is memoized Dynamic Programming code to print the count of the number of subsets with a given sum. The repeating values of DP are stores in "tmp" array. To attain a DP solution first always start with a recursive solution to the problem and then store the repeating value in a tmp array to arrive at a memoized solution.
#include <bits/stdc++.h>
using namespace std;
int tmp[1001][1001];
int subset_count(int* arr, int sum, int n)
{ ` if(sum==0)
return 1;
if(n==0)
return 0;
if(tmp[n][sum]!=-1)
return tmp[n][sum];
else{
if(arr[n-1]>sum)
return tmp[n][sum]=subset_count(arr,sum, n-1);
else{
return tmp[n][required_sum]=subset_count(arr,sum, n- 1)+subset_count(arr,sum-arr[n-1], n-1);`
}
}
}
// Driver code
int main()
{ ` memset(tmp,-1,sizeof(tmp));
int arr[] = { 2, 3, 5, 6, 8, 10 };
int n = sizeof(arr) / sizeof(int);
int sum = 10; `
cout << subset_count(arr,sum, n);
return 0;
}
This is recursive solution. It has time complexity of O(2^n)
Use Dynamic Programming to Improve time complexity to be Quadratic O(n^2)
def count_of_subset(arr,sum,n,count):
if sum==0:
count+=1
return count
if n==0 and sum!=0:
count+=0
return count
if arr[n-1]<=sum:
count=count_of_subset(arr,sum-arr[n-1],n-1,count)
count=count_of_subset(arr,sum,n-1,count)
return count
else:
count=count_of_subset(arr,sum,n-1,count)
return count
int numSubseq(vector<int>& nums, int target) {
int size = nums.size();
int T[size+1][target+1];
for(int i=0;i<=size;i++){
for(int j=0;j<=target;j++){
if(i==0 && j!=0)
T[i][j]=0;
else if(j==0)
T[i][j] = 1;
}
}
for(int i=1;i<=size;i++){
for(int j=1;j<=target;j++){
if(nums[i-1] <= j)
T[i][j] = T[i-1][j] + T[i-1][j-nums[i-1]];
else
T[i][j] = T[i-1][j];
}
}
return T[size][target];
}
Although the above base case will work fine if the constraints is : 1<=v[i]<=1000
But consider : constraints : 0<=v[i]<=1000
The above base case will give wrong answer , consider a test case : v = [0,0,1] and k = 1 , the output will be "1" according to the base case .
But the correct answer is 3 : {0,1}{0,0,1}{1}
to avoid this we can go deep instead of returning 0 , and fix it by
C++:
if(ind==0)
{
if(v[0]==target and target==0)return 2;
if(v[0]==target || target==0)return 1;
return 0 ;
}
One of the answer to this solution is to generate a power set of N, where N is the size of the array which will be equal to 2^n. For every number between 0 and 2^N-1 check its binary representation and include all the values from the array for which the bit is in the set position i.e one.
Check if all the values you included results in the sum which is equal to the required value.
This might not be the most efficient solution but as this is an NP hard problem, there exist no polynomial time solution for this problem.

Resources