Finding a Summation in range - algorithm

I have given an Array A and B. Where B contains the indexes of A.
Let
A = [2,3,5,6,7,8,9]
B = [1,3,1,4,5,2,6]
I have given Q queries where i have to find the Sum in the Range L to R using B.
Example L=2 , R=4
Sum = A[B[2]]+A[B[3]]+ A[B[4]]
Sum = A[3] + A[1]+A[4] = 5+2+6=13
Now a Query for update i.e
U 2 10
A[2] =10 // previously 3
S 6 7
Sum = A[B[6]]+A[B[7]] = A[2]+A[6]=10+8=18
Is there any solution better than O(Q*N) , any data structure which support both update and summation in this case. (Segment Tree and BITS like)
Constraints:
N,Q,Value<10^6

Related

Compute mean of columns for groups of rows in Octave

I have a matrix, for example:
1 2
3 4
4 5
And I also have a rule of grouping the rows, which is defined as a vector of group IDs like this:
1
2
1
Which means that the first and the third rows belong to the same group (ID 1) and the second row belong to another group (ID 2). So, I would like to compute the mean value for each group. Here is the result for my example:
2.5 3.5
3 4
More formally, there is a matrix A of size (m, n), a number of groups k and a vector v of size (m, 1), values of which are integers in range from 1 to k. The result is a matrix R of size (k, n), where each row with index r corresponds to the mean value of the group r.
Here is my solution (which does what I need) using for-loop in Octave:
R = zeros(k, n);
for r = 1:k
R(r, :) = mean(A((v == r), :), 1);
end
I wonder whether it could be vectorized. So, what I need is to replace the for-loop with a vectorized solution, which is going to be much more efficient than the iterative one.
Here is one of my many attempts (which do not work) to solve the problem in a vectorized way:
R = mean(A((v == 1:k), :);
As long as our data is of floating point, you can just do it manually by doing the sum yourself and then divide, by making use of accumdim. Like so:
octave:1> A = [1 2; 3 4; 4 5];
octave:2> subs = [1; 2; 1];
octave:3> accumdim (subs, A) ./ accumdim (subs, ones (rows (subs), 1))
ans =
2.5000 3.5000
3.0000 4.0000
You can consider it as a matrix multiplication problem. For instance, for your example this corresponds to
A = [1 2; 3 4; 4 5];
B = [0.5,0,0.5;0,1,0];
C = B*A
The main issue, is to construct B from your list of indicies in an efficient manner. My suggestion is to use the implicit expansion of ==.
A = [1 2; 3 4; 4 5]; % Input data
idx = [1;2;1]; % Input Grouping
k = 2; % number of groups, ( = max(idx) )
m = 3; % Number of "observations"
Btmp = (idx == 1:k)'; % Mark locations
B = Btmp ./sum(Btmp,2); % Normalise
C = B*A
C =
2.5000 3.5000
3.0000 4.0000

Range Summation of Array

We are giving Array A and two type of query
Change F[x] = V
Find the Summation from [1.R] with d = x .i.e sum (1, 1+x, 1+2x, ...)
I know fro x = 1 it can be done with BIT or Segment Tree, but how do if d > 1?

Vectorized search for permutations (with repetitions) that contain given subpermutations (with repetitions)

This question is can be viewed continuation/extension/generalization of a previous question of mine from here.
Some definitions: I have a set of integers S = {1,2,...,s}, say s = 20, and two matrices N and M whose rows are finite sequences of numbers from S (i.e. permutations with possible repetitions), of order n and m respectively, where 1 <= n <= m. Let us think of N as a collection of candidate sub-sequences for the sequences from M.
Example: [2 3 4 3] is a sub-sequence of [1 2 2 3 5 4 1 3] that occurs with multiplicity 2 (=in how many different ways one can find the sub-seq. in the main seq.), whereas [3 2 2 3] is not a sub-sequence of it. In particular, a valid sub-sequence by definition must preserve the order of the indices.
Problem statement:
(P1) For each row of M, obtain the number of sub-sequences of it, with multiplicity and without multiplicity, that occur in N as rows (it can be zero if none are contained in N);
(P2) For each row of N, find out how many times, with multiplicity and without multiplicity, it is contained in M as a sub-sequence (again, this number can be zero);
Example: Let N = [1 2 2; 2 3 4] and M = [1 1 2 2 3; 1 2 2 3 4; 1 2 3 5 6]. Then (P1) returns [2; 3; 0] for 'with multiplicities' and [1; 2; 0] for 'without multiplicities'. (P2) returns [3; 2] for 'with multiplicities' and [2; 1] without multiplicities.
Order of magnitude: M could typically have up to 30-40 columns and a few thousand rows, although I currently have M with only a few hundred rows and ~10 columns. N could be approaching the size of
M or could be also much smaller.
What I have so far: Not much, to be honest. I believe I might be able to slightly modify my not-very-well-vectorized solution from my previous question to tackle permutations with repetitions, but I am still thinking on that and will update as soon as I have something working. But given my (lack of) experience so far, it would be in all likelihood very suboptimal :(
Thanks!
Introduction : Owing to the repetitions in the input data in each row, the combination finding process doesn't have the sort of "uniqueness" among elements which was exploited in your previous problem and hence the loops used here. Also, note that the without multiplicity codes don't use nchoosek and as such, I feel more optimistic about them for performance.
Notations :
p1wim -> P1 with multiplicity
p2wim -> P2 with multiplicity
p1wom -> P1 without multiplicity
p2wom -> P2 without multiplicity
Codes :
I. Code for P1, 2 with multiplicity
permN = permute(N,[3 2 1]);
p1wim(size(M,1),1)=0;
p2wim(size(N,1),1)=0;
for k1 = 1:size(M,1)
d1 = nchoosek(M(k1,:),3);
t1 = all(bsxfun(#eq,d1,permN),2);
p1wim(k1) = sum(t1(:));
p2wim = p2wim + squeeze(sum(t1,1));
end
II. Code for P1, 2 without multiplicity
eqmat = bsxfun(#eq,M,permute(N,[3 4 2 1])); %// equality matrix
[m,n,p,q] = size(eqmat); %// get sizes
inds = zeros(size(M,1),p,q); %// pre-allocate for indices array
vec1 = [1:m]'; %//' setup constants to loop
vec2 = [0:q-1]*m*n*p;
vec3 = permute([0:p-1]*m*n,[1 3 2]);
for iter = 1:p
[~,ind1] = max(eqmat(:,:,iter,:),[],2);
inds(:,iter,:) = reshape(ind1,m,1,q);
ind2 = squeeze(ind1);
ind3 = bsxfun(#plus,vec1,(ind2-1)*m); %//' setup forward moving equalities
ind4 = bsxfun(#plus,ind3,vec2);
ind5 = bsxfun(#plus,ind4,vec3);
eqmat(ind5(:)) = 0;
end
p1wom = sum(all(diff(inds,[],2)>0,2),3);
p2wom = squeeze(sum(all(diff(inds,[],2)>0,2),1));
As usual, I would encourage you to use gpuArrays too with your favorite parfor.
This approach uses only one loop over the rows of M (P1) or N (P2). The code makes use of linear indexing and the very powerful bsxfun function. Note that if the number of columns is large you may experience problems because of nchoosek.
[mr mc] = size(M);
[nr nc] = size(N);
%// P1
combs = nchoosek(1:mc, nc)-1;
P1mu = NaN(mr,1);
P1nm = NaN(mr,1);
for r = 1:mr
aux = M(r+mr*combs);
P1mu(r) = sum(ismember(aux, N, 'rows'));
P1nm(r) = sum(ismember(unique(aux, 'rows'), N, 'rows'));
end
%// P2. Multiplicity defined to span across different rows
rr = reshape(repmat(1:mr, size(combs,1), 1),[],1);
P2mu = NaN(nr,1);
P2nm = NaN(nr,1);
for r = 1:nr
aux = M(bsxfun(#plus, rr, mr*repmat(combs, mr, 1)));
P2mu(r) = sum(all(bsxfun(#eq, N(r,:), aux), 2));
P2nm(r) = sum(all(bsxfun(#eq, N(r,:), unique(aux, 'rows')), 2));
end
%// P2. Multiplicity defined restricted to within one row
rr = reshape(repmat(1:mr, size(combs,1), 1),[],1);
P2mur = NaN(nr,1);
P2nmr = NaN(nr,1);
for r = 1:nr
aux = M(bsxfun(#plus, rr, mr*repmat(combs, mr, 1)));
P2mur(r) = sum(all(bsxfun(#eq, N(r,:), aux), 2));
aux2 = unique([aux rr], 'rows'); %// concat rr to differentiate rows...
aux2 = aux2(:,1:end-1); %// ...and now remove it
P2nmr(r) = sum(all(bsxfun(#eq, N(r,:), aux2), 2));
end
Results for your example data:
P1mu =
2
3
0
P1nm =
1
2
0
P2mu =
3
2
P2nm =
1
1
P2mur =
3
2
P2nmr =
2
1
Some optimizations to the code would be possible. Not sure they are worth the effort:
Replace repmat by another bsxfun (using a 3rd dimension). That may save some memory
Transpose original matrices and work down colunmns, instead of along rows. That may be faster.

Enumerate matrix combinations with fixed row and column sums

I'm attempting to find an algorithm (not a matlab command) to enumerate all possible NxM matrices with the constraints of having only positive integers in each cell (or 0) and fixed sums for each row and column (these are the parameters of the algorithm).
Exemple :
Enumerate all 2x3 matrices with row totals 2, 1 and column totals 0, 1, 2:
| 0 0 2 | = 2
| 0 1 0 | = 1
0 1 2
| 0 1 1 | = 2
| 0 0 1 | = 1
0 1 2
This is a rather simple example, but as N and M increase, as well as the sums, there can be a lot of possibilities.
Edit 1
I might have a valid arrangement to start the algorithm:
matrix = new Matrix(N, M) // NxM matrix filled with 0s
FOR i FROM 0 TO matrix.rows().count()
FOR j FROM 0 TO matrix.columns().count()
a = target_row_sum[i] - matrix.rows[i].sum()
b = target_column_sum[j] - matrix.columns[j].sum()
matrix[i, j] = min(a, b)
END FOR
END FOR
target_row_sum[i] being the expected sum on row i.
In the example above it gives the 2nd arrangement.
Edit 2:
(based on j_random_hacker's last statement)
Let M be any matrix verifying the given conditions (row and column sums fixed, positive or null cell values).
Let (a, b, c, d) be 4 cell values in M where (a, b) and (c, d) are on the same row, and (a, c) and (b, d) are on the same column.
Let Xa be the row number of the cell containing a and Ya be its column number.
Example:
| 1 a b |
| 1 2 3 |
| 1 c d |
-> Xa = 0, Ya = 1
-> Xb = 0, Yb = 2
-> Xc = 2, Yc = 1
-> Xd = 2, Yd = 2
Here is an algorithm to get all the combinations verifying the initial conditions and making only a, b, c and d varying:
// A matrix array containing a single element, M
// It will be filled with all possible combinations
matrices = [M]
I = min(a, d)
J = min(b, c)
FOR i FROM 1 TO I
tmp_matrix = M
tmp_matrix[Xa, Ya] = a - i
tmp_matrix[Xb, Yb] = b + i
tmp_matrix[Xc, Yc] = c - i
tmp_matrix[Xd, Yd] = d + i
matrices.add(tmp_matrix)
END FOR
FOR j FROM 1 TO J
tmp_matrix = M
tmp_matrix[Xa, Ya] = a + j
tmp_matrix[Xb, Yb] = b - j
tmp_matrix[Xc, Yc] = c + j
tmp_matrix[Xd, Yd] = d - j
matrices.add(tmp_matrix)
END FOR
It should then be possible to find every possible combination of matrix values:
Apply the algorithm on the first matrix for every possible group of 4 cells ;
Recursively apply the algorithm on each sub-matrix obtained by the previous iteration, for every possible group of 4 cells except any group already used in a parent execution ;
The recursive depth should be (N*(N-1)/2)*(M*(M-1)/2), each execution resulting in ((N*(N-1)/2)*(M*(M-1)/2) - depth)*(I+J+1) sub-matrices. But this creates a LOT of duplicate matrices, so this could probably be optimized.
Are you needing this to calculate Fisher's exact test? Because that requires what you're doing, and based on that page, it seems there will in general be a vast number of solutions, so you probably can't do better than a brute force recursive enumeration if you want every solution. OTOH it seems Monte Carlo approximations are successfully used by some software instead of full-blown enumerations.
I asked a similar question, which might be helpful. Although that question deals with preserving frequencies of letters in each row and column rather than sums, some results can be translated across. E.g. if you find any submatrix (pair of not-necessarily-adjacent rows and pair of not-necessarily-adjacent columns) with numbers
xy
yx
Then you can rearrange these to
yx
xy
without changing any row or column sums. However:
mhum's answer proves that there will in general be valid matrices that cannot be reached by any sequence of such 2x2 swaps. This can be seen by taking his 3x3 matrices and mapping A -> 1, B -> 2, C -> 4 and noticing that, because no element appears more than once in a row or column, frequency preservation in the original matrix is equivalent to sum preservation in the new matrix. However...
someone's answer links to a mathematical proof that it actually will work for matrices whose entries are just 0 or 1.
More generally, if you have any submatrix
ab
cd
where the (not necessarily unique) minimum is d, then you can replace this with any of the d+1 matrices
ef
gh
where h = d-i, g = c+i, f = b+i and e = a-i, for any integer 0 <= i <= d.
For a NXM matrix you have NXM unknowns and N+M equations. Put random numbers to the top-left (N-1)X(M-1) sub-matrix, except for the (N-1, M-1) element. Now, you can find the closed form for the rest of N+M elements trivially.
More details: There are total of T = N*M elements
There are R = (N-1)+(M-1)-1 randomly filled out elements.
Remaining number of unknowns: T-S = N*M - (N-1)*(M-1) +1 = N+M

Pseudo number generation

Following is text from Data structure and algorithm analysis by Mark Allen Wessis.
Following x(i+1) should be read as x subscript of i+1, and x(i) should be
read as x subscript i.
x(i + 1) = (a*x(i))mod m.
It is also common to return a random real number in the open interval
(0, 1) (0 and 1 are not possible values); this can be done by
dividing by m. From this, a random number in any closed interval [a,
b] can be computed by normalizing.
The problem with this routine is that the multiplication could
overflow; although this is not an error, it affects the result and
thus the pseudo-randomness. Schrage gave a procedure in which all of
the calculations can be done on a 32-bit machine without overflow. We
compute the quotient and remainder of m/a and define these as q and
r, respectively.
In our case for M=2,147,483,647 A =48,271, q = 127,773, r = 2,836, and r < q.
We have
x(i + 1) = (a*x(i))mod m.---------------------------> Eq 1.
= ax(i) - m (floorof(ax(i)/m)).------------> Eq 2
Also author is mentioning about:
x(i) = q(floor of(x(i)/q)) + (x(i) mod Q).--->Eq 3
My question
what does author mean by random number is computed by normalizing?
How author came with Eq 2 from Eq 1?
How author came with Eq 3?
Normalizing means if you have X ∈ [0,1] and you need to get Y ∈ [a, b] you can compute
Y = a + X * (b - a)
EDIT:
2. Let's suppose
a = 3, x = 5, m = 9
Then we have
where [ax/m] means an integer part.
So we have 15 = [ax/m]*m + 6
We need to get 6. 15 - [ax/m]*m = 6 => ax - [ax/m]*m = 6 => x(i+1) = ax(i) - [ax(i)/m]*m
If you have a random number in the range [0,1], you can get a number in the range [2,5] (for example) by multiplying by 3 and adding 2.

Resources