Random choosing number in array without repeated - algorithm

I have a algorithm to randomly select element t in a array with out repeated. This is more detail of algorithm
It can explain as folowing:
Initial a array index u that stores the index of numbers from 1 to k (line 1 to 3)
Set initial of gamma from k and reduce by one for each iteration. The purpose of gamma is for without repeated (line 4,9,10)
Random choose a number t from 1 to N(at the j=1, choose 1 to k, N are nonrepated number), and then put the number to the end of array.
Repate the step 2 to 3
If gamma =0,reset gamma=k
This function will return the t.
For example, I have a array A=[1,2,3,4,5,6,7,8,9], k=9 =size(A), N=12 (From 1 to 9, number select only one time). Now I want to use this algorithm to randomly select number t from array A. This is my code. However, it does not similar the line 6 in the algorithm. Is it right? Let see my code help me
function nonRepeat
k=9;
u=1:k; % initial value of index
N=12
gamma=k;
for j=1:N
index=randi(gamma,1); % use other choosing
t=u(index)
%%swapping
temp=u(t);
u(t)=u(gamma);
u(gamma)=temp;
gamma=gamma-1;
if gamma==0
gamma=k;
end
end
end

I think index=randi(gamma,1); is not right because it says select number t randomly but you select index randomly and assign t=u(index).
See if it works,
k = 9;
u = 1 : k;
N = 12;
gamma = k;
for j = 1 : N
t = randi(gamma,1);
temp = u(t);
u(t) = u(gamma);
u(gamma) = temp;
gamma = gamma - 1;
if gamma == 0
gamma = k;
end
end

Related

Count subsets of array which qualify min(subset)+max(subset) < k

Was asked this question in an interview, didn't have a better answer than generating all possible subsets.
Example:
a = [4,2,5,7] k = 8
output = 4
[2],[4,2],[2,5],[4,2,5]
Interviewer tried implying sorting the array should help, but I still couldn't figure out a better-than-brute-force solution. Will appreciate your input.
The interviewer implied that sorting the array would help and it does help. I'll try to explain.
Taking the array and k values you stated:
a = [4,2,5,7]
k = 8
Sorting the array will yield:
a_sort = [2,4,5,7]
Now we can consider the following procedure:
set ii = 0, jj = 1
choose a_sort[ii] as a part of your subset
2.1. If 2 * a_sort[ii] >= k, you are done. else, the subset [a_sort[ii]] holds the condition and is a part of the solution.
add a_sort[ii+jj] to your subset
3.1. If a_sort[ii] + a_sort[ii+jj] < k,
3.1.1. the subset [a_sort[ii], a_sort[ii+jj]] holds the condition and is a part of the solution, as well as any subset which consists of any additional number of elements a_sort[kk] where ii< kk < ii+jj
3.1.2. set jj += 1 and go back to step 3.
3.2. else, set ii += 1, jj = ii + 1, go back to step 2
With your input this procedure should return:
[[2], [2,4],[2,5],[2,4,5]]
# [2,7] results in 9 > 8 and therefore you move to [4]
# Note that for [4] subset you get 8 = 8 which is not smaller than 8, we are done
Explenation
if you have a subset of [a_sort[ii]] which does not hold 2 * a_sort[ii] < k, adding additional numbers to the subset will only yield min(subset)+max(subset) > 2 * a_sort[ii] > k and therefore there will not be any additional subsets which hold the wanted condition. Moreover, by setting a subset of [a_sort[ii+1]] will results in 2 * a_sort[ii+1] >= 2 * a_sort[ii] > k` sinse a_sort is sorted. Therefore you will not find any additional subsets.
for jj > ii, if a_sort[ii] + a_sort[ii+jj] < k then you can push any number if members from a_sort into the subset, as long as the index kk will be bigger than ii and lower than ii+jj since a_sort is sorted, and adding these members to the subset will not change the value of min(subset)+max(subset) which will remain a_sort[ii] + a_sort[ii+jj] and we already know that this value is smaller thank k
Getting the count
In case you simply want to the possible subsets, this can be done easier than generating the subsets themselves.
Assuming that for ii > jj the condition holds, i.e. a_sort[ii] + a_sort[ii+jj] < k. If jj = ii + 1 there is an addition of 1 possible subset. If jj > ii + 1 there are jj - ii - 1 additional elements which can be either present not not without a change of the value a_sort[ii] + a_sort[ii+jj]. Therefore there are a total of 2**(jj-ii-1) additional subsets available to add to the solution group (jj-ii-1 elements, each is independently present or not). This also holds for jj = ii + 1 since in this case 2**(jj-ii-1) = 2**0 = 1
Looking at the example above:
[2] adds 1 count
[2,4] adds 1 count (1 = 0 + 1)
[2,5] adds 2 counts (2 = 0 + 2 --> 2 **(2 - 0 - 1) = 2**1 = 2)
A total count of 4
Sort the array
For an element x at index l, do a binary search on the array to get index of the maximum integer in the array which is < k-x. Let the index be r.
For all subsets where min(subset) = x, we can have any element with index in range (l,r]. Number of subsets with min(subset) = x becomes the total number of possible subsets for (r-l) elements, so count = 2^(r-l) (or 0 if r<l).
(Note: in all such subsets, we are fixing x. That's why the range (l,r] isn't inclusive of l)
You have to iterate over the array, use the above process for each element/index to get the count of subsets where our current element is the minimum and the subset satisfies the given constraint. If you find an element with count=0, break the iteration.
This should work with a 0(N*log(N)) complexity, good enough for an interview question imo.
For the given example, sorted array = [2,4,5,7].
For element 2, l=0 and r=2. Count = 2^(2-0) = 4 (covers [2],[4,2],[2,5],[4,2,5]
For element 4, l=1 and r=0. Count = 0, and we break the iteration.

How to generate random number that satisfying poisson distribution

I want to generate 500000 random numbers of Poisson distribution with lambda = 1, and T=6 by using the composition method which can be describes as follows:
Generate uniform r.v. z1, z2, …
Stop when z1.z2..zm<=exp(-lamda*T)
Assign k = m – 1
Then count how many number in each of 10 intervals ([0,1],[2,3],…, [16,17], [18,∞)].
I know that MATLAB has a built-in function poissrnd for above task. However, I want to use the above algorithm to do it by myself. I tried do it and compared it with the result of the poissrnd function, but my code gives a wrong result. Could you look at my code and give me some comments?
num_generated = 500000;
lambda=1;T=6;
k_vec=[]; %% Store k
for i=1:number_generated
multiple=1;
for j=1:number_generated
%% Step 1: Generate uniform in the interval [0,1]: z1,z2...
z=rand();
%% Step 2: Stop when z1z2...zm<=exp(-lambda*T)
multiple=multiple*z;
if(multiple<=exp(-lambda*T))
k=j-1;
k_vec=[k_vec k]; % Record k in vec
break;
end
end
end
range_1 = sum( k_vec(:)==0 )+sum(k_vec(:)==1) % # number with in range [0,1]
range_2 = sum( k_vec(:)==2 )+sum( k_vec(:)==3) % # number with in range [2,3]
range_3 = sum( k_vec(:)==4 )+sum( k_vec(:)==5) % # number with in range [4,5]
range_4 = sum( k_vec(:)==6 )+sum( k_vec(:)==7) % # number with in range [6,7]
range_5 = sum( k_vec(:)==8 )+sum( k_vec(:)==9) % # number with in range [8,9]
range_6 = sum( k_vec(:)==10 )+sum( k_vec(:)==11) % # number with in range [10,11]
range_7 = sum( k_vec(:)==12 )+sum( k_vec(:)==13) % # number with in range [12,13]
range_8 = sum( k_vec(:)==14 )+sum( k_vec(:)==15) % # number with in range [14,15]
range_9 = sum( k_vec(:)==16 )+sum( k_vec(:)==17) % # number with in range [16,17]
range_10 = sum(k_vec(:)>=18) % # number with in range [18,+infty)
You don't know how many random values it will take for multiple to converge, so you need to change your for loop over j to a while loop that continues as long as multiple > exp(-lambda*T).
By changing this to a while loop, you now need k to be a counter and to increment it on each iteration of the loop:
(Warning: Untested Code)
for i = 1:number_generated
multiple = 1;
k = 0; %// Initialize counter for each number generated
while multiple > exp(-lambda*T) %// replace `for` loop
k = k + 1; %// Increment counter
%% Step 1: Generate uniform in the interval [0,1]: z1,z2...
z = rand();
%% Step 2: Stop when z1z2...zm<=exp(-lambda*T)
multiple = multiple*z;
end
%// If we exit the loop, we know multiple <= exp(-lambda*T)
k = k - 1;
k_vec = [k_vec k]; % Record k in vec
end
You should also avoid at all costs using sequential variable names like range_1, range_2, ... Matlab is designed to handle arrays and matrices, so you should used them. The simplest way to do this in your case, without even looping or vectorization, is:
range(1) = sum(...
range(2) = sum(...
...
range(10) = sum(...
Now you have one variable in your workspace rather than 10 and any operations you perform on this variable will be much easier.
I don't use Matlab so I can't give you the exact syntax for a fix. At a minimum, it looks like you're forgetting to reset multiple and k for each new Poisson. Also, you're only generating a single z.
A working implementation to get num_generated Poisson outcomes should look something like the following pseudocode:
threshold = Math.exp(-lambda * T)
loop num_generated times {
%% Each time through this loop produces a single Poisson outcome
count = 0
product = 1.0
while (product = product * rand()) >= threshold {
count += 1
}
%% count now has a valid Poisson value, do what you want with it
}

Reshape vector to matrix with column-wise zero padding in matlab

for an input matrix
in = [1 1;
1 2;
1 3;
1 4;
2 5;
2 6;
2 7;
3 8;
3 9;
3 10;
3 11];
i want to get the output matrix
out = [1 5 8;
2 6 9;
3 7 10;
4 0 11];
meaning i want to reshape the second input column into an output matrix, where all values corresponding to one value in the first input column are written into one column of the output matrix.
As there can be different numbers of entries for each value in the first input column (here 4 values for "1" and "3", but only 3 for "2"), the normal reshape function is not applicable. I need to pad all columns to the maximum number of rows.
Do you have an idea how to do this matlab-ish?
The second input column can only contain positive numbers, so the padding values can be 0, -x, NaN, ...
The best i could come up with is this (loop-based):
maxNumElem = 0;
for i=in(1,1):in(end,1)
maxNumElem = max(maxNumElem,numel(find(in(:,1)==i)));
end
out = zeros(maxNumElem,in(end,1)-in(1,1));
for i=in(1,1):in(end,1)
tmp = in(in(:,1)==i,2);
out(1:length(tmp),i) = tmp;
end
Either of the following approaches assumes that column 1 of in is sorted, as in the example. If that's not the case, apply this initially to sort in according to that criterion:
in = sortrows(in,1);
Approach 1 (using accumarray)
Compute the required number of rows, using mode;
Use accumarray to gather the values corresponding to each column, filled with zeros at the end. The result is a cell;
Concatenate horizontally the contents of all cells.
Code:
[~, n] = mode(in(:,1)); %//step 1
out = accumarray(in(:,1), in(:,2), [], #(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}]; %//step 3
Alternatively, step 1 could be done with histc
n = max(histc(in(:,1), unique(in(:,1)))); %//step 1
or with accumarray:
n = max(accumarray(in(:,1), in(:,2), [], #(x) numel(x))); %//step 1
Approach 2 (using sparse)
Generate a row-index vector using this answer by #Dan, and then build your matrix with sparse:
a = arrayfun(#(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
out = full(sparse([a{:}], in(:,1), in(:,2)));
Introduction to proposed solution and Code
Proposed here is a bsxfun based masking approach that uses the binary operators available as builtins for use with bsxfun and as such I would consider this very appropriate for problems like this. Of course, you must also be aware that bsxfun is a memory hungry tool. So, it could pose a threat if you are dealing with maybe billions of elements depending also on the memory available for MATLAB's usage.
Getting into the details of the proposed approach, we get the counts of each ID from column-1 of the input with histc. Then, the magic happens with bsxfun + #le to create a mask of positions in the output array (initialized by zeros) that are to be filled by the column-2 elements from input. That's all you need to tackle the problem with this approach.
Solution Code
counts = histc(in(:,1),1:max(in(:,1)))'; %//' counts of each ID from column1
max_counts = max(counts); %// Maximum counts for each ID
mask = bsxfun(#le,[1:max_counts]',counts); %//'# mask of locations where
%// column2 elements are to be placed
out = zeros(max_counts,numel(counts)); %// Initialize the output array
out(mask) = in(:,2); %// place the column2 elements in the output array
Benchmarking (for performance)
The benchmarking presented here compares the proposed solution in this post against the various methods presented in Luis's solution. This skips the original loopy approach presented in the problem as it appeared to be very slow for the input generated in the benchmarking code.
Benchmarking Code
num_ids = 5000;
counts_each_id = randi([10 100],num_ids,1);
num_runs = 20; %// number of iterations each approach is run for
%// Generate random input array
in = [];
for k = 1:num_ids
in = [in ; [repmat(k,counts_each_id(k),1) rand(counts_each_id(k),1)]];
end
%// Warm up tic/toc.
for k = 1:50000
tic(); elapsed = toc();
end
disp('------------- With HISTC + BSXFUN Masking approach')
tic
for iter = 1:num_runs
counts = histc(in(:,1),1:max(in(:,1)))';
max_counts = max(counts);
out = zeros(max_counts,numel(counts));
out(bsxfun(#le,[1:max_counts]',counts)) = in(:,2);
end
toc
clear counts max_counts out
disp('------------- With MODE + ACCUMARRAY approach')
tic
for iter = 1:num_runs
[~, n] = mode(in(:,1)); %//step 1
out = accumarray(in(:,1), in(:,2), [], #(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}];
end
toc
clear n out
disp('------------- With HISTC + ACCUMARRAY approach')
tic
for iter = 1:num_runs
n = max(histc(in(:,1), unique(in(:,1))));
out = accumarray(in(:,1), in(:,2), [], #(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}];
end
toc
clear n out
disp('------------- With ARRAYFUN + Sparse approach')
tic
for iter = 1:num_runs
a = arrayfun(#(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
out = full(sparse([a{:}], in(:,1), in(:,2)));
end
toc
clear a out
Results
------------- With HISTC + BSXFUN Masking approach
Elapsed time is 0.598359 seconds.
------------- With MODE + ACCUMARRAY approach
Elapsed time is 2.452778 seconds.
------------- With HISTC + ACCUMARRAY approach
Elapsed time is 2.579482 seconds.
------------- With ARRAYFUN + Sparse approach
Elapsed time is 1.455362 seconds.
slightly better, but still uses a loop :(
out=zeros(4,3);%set to zero matrix
for i = 1:max(in(:,1)); %find max in column 1, and loop for that number
ind = find(in(:,1)==i); %
out(1: size(in(ind,2),1),i)= in(ind,2);
end
don't know if you can avoid the loop...

Calculate rank of binary matrix with larger size

Hell all, I have some problem when compute the rank of binary matrix that only 1 or 0. The rank of binary matrix will based on the row reduction using boolean operations XOR. Let see the XOR operation:
1 xor 1 =0
1 xor 0= 1
0 xor 0= 0
0 xor 1= 1
Given a binary matrix as
A =
1 1 0 0 0 0
1 0 0 0 0 1
0 1 0 0 0 1
We can see the third row equals first row xor with second row. Hence, the rank of matrix A only 2, instead of 3 by rank matlab function.
I have one way to compute the extractly rank of binary matrix using this code
B=gf(A)
rank(B)
It will return 2. However, when I compute with large size of matrix, for example 400 by 400. It does not return the rank (never stop). Could you suggest to me the good way to find rank of binary matrix for large size? Thank all
UPDATE: this is computation time using tic toc
N=50; Elapsed time is=0.646823 seconds
N=100;Elapsed time is 3.123573 seconds.
N=150;Elapsed time is 7.438541 seconds.
N=200;Elapsed time is 11.349964 seconds.
N=400;Elapsed time is 66.815286 seconds.
Note that check rank is only the condition in my algorithm. However, it take very long long time, then it will affect to my method
Base on the suggestion of R. I will use Gaussian Elimination to find the rank. This is my code. However, it call the rank function (spend some computation times). Could you modify help me without using rank function?
function rankA=GaussEliRank(A)
mat = A;
[m n] = size(A); % read the size of the original matrix A
for i = 1 : n
j = find(mat(i:m, i), 1); % finds the FIRST 1 in i-th column starting at i
if isempty(j)
mat = mat( sum(mat,2)>0 ,:);
rankA=rank(mat); %%Here
return;
else
j = j + i - 1; % we need to add i-1 since j starts at i
temp = mat(j, :); % swap rows
mat(j, :) = mat(i, :);
mat(i, :) = temp;
% add i-th row to all rows that contain 1 in i-th column
% starting at j+1 - remember up to j are zeros
for k = find(mat( (j+1):m, i ))'
mat(j + k, :) = bitxor(mat(j + k, :), mat(i, :));
end
end
end
%remove all-zero rows if there are some
mat = mat( sum(mat,2)>0 ,:);
if any(sum( mat(:,1:n) ,2)==0) % no solution because matrix A contains
error('No solution.'); % all-zero row, but with nonzero RHS
end
rankA=rank(mat); %%Here
end
Let check the matrix A at here. Correct ans is 393 for rank of A.
Once you get the matrix into row echelon form with Gaussian elimination, the rank is the number of nonzero rows. You should be able to replace the code after the loop with something like rankA=sum(sum(mat,2)>0);.

generate random numbers within a range with different probabilities

How can i generate a random number between A = 1 and B = 10 where each number has a different probability?
Example: number / probability
1 - 20%
2 - 20%
3 - 10%
4 - 5%
5 - 5%
...and so on.
I'm aware of some hard-coded workarounds which unfortunately are of no use with larger ranges, for example A = 1000 and B = 100000.
Assume we have a
Rand()
method which returns a random number R, 0 < R < 1, can anyone post a code sample with a proper way of doing this ? prefferable in c# / java / actionscript.
Build an array of 100 integers and populate it with 20 1's, 20 2's, 10 3's, 5 4's, 5 5's, etc. Then just randomly pick an item from the array.
int[] numbers = new int[100];
// populate the first 20 with the value '1'
for (int i = 0; i < 20; ++i)
{
numbers[i] = 1;
}
// populate the rest of the array as desired.
// To get an item:
// Since your Rand() function returns 0 < R < 1
int ix = (int)(Rand() * 100);
int num = numbers[ix];
This works well if the number of items is reasonably small and your precision isn't too strict. That is, if you wanted 4.375% 7's, then you'd need a much larger array.
There is an elegant algorithm attributed by Knuth to A. J. Walker (Electronics Letters 10, 8 (1974), 127-128; ACM Trans. Math Software 3 (1977), 253-256).
The idea is that if you have a total of k * n balls of n different colors, then it is possible to distribute the balls in n containers such that container no. i contains balls of color i and at most one other color. The proof is by induction on n. For the induction step pick the color with the least number of balls.
In your example n = 10. Multiply the probabilities with a suitable m such that they are all integers. So, maybe m = 100 and you have 20 balls of color 0, 20 balls of color 1, 10 balls of color 2, 5 balls of color 3, etc. So, k = 10.
Now generate a table of dimension n with each entry being a probability (the ration of balls of color i vs the other color) and the other color.
To generate a random ball, generate a random floating-point number r in the range [0, n). Let i be the integer part (floor of r) and x the excess (r – i).
if (x < table[i].probability) output i
else output table[i].other
The algorithm has the advantage that for each random ball you only make a single comparison.
Let me work out an example (same as Knuth).
Consider simulating throwing a pair of dice.
So P(2) = 1/36, P(3) = 2/36, P(4) = 3/36, P(5) = 4/36, P(6) = 5/36, P(7) = 6/36, P(8) = 5/36, P(9) = 4/36, P(10) = 3/36, P(11) = 2/36, P(12) = 1/36.
Multiply by 36 * 11 to get 393 balls, 11 of color 2, 22 of color 3, 33 of color 4, …, 11 of color 12.
We have k = 393 / 11 = 36.
Table[2] = (11/36, color 4)
Table[12] = (11/36, color 10)
Table[3] = (22/36, color 5)
Table[11] = (22/36, color 5)
Table[4] = (8/36, color 9)
Table[10] = (8/36, color 6)
Table[5] = (16/36, color 6)
Table[9] = (16/36, color 8)
Table[6] = (7/36, color 8)
Table[8] = (6/36, color 7)
Table[7] = (36/36, color 7)
Assuming that you have a function p(n) that gives you the desired probability for a random number:
r = rand() // a random number between 0 and 1
for i in A to B do
if r < p(i)
return i
r = r - p(i)
done
A faster way is to create an array of (B - A) * 100 elements and populate it with numbers from A to B such that the ratio of the number of each item occurs in the array to the size of the array is its probability. You can then generate a uniform random number to get an index to the array and directly access the array to get your random number.
Map your uniform random results to the required outputs according to the probabilities.
E.g., for your example:
If `0 <= Round() <= 0.2`: result = 1.
If `0.2 < Round() <= 0.4`: result = 2.
If `0.4 < Round() <= 0.5`: result = 3.
If `0.5 < Round() <= 0.55`: result = 4.
If `0.55 < Round() <= 0.65`: result = 5.
...
Here's an implementation of Knuth's Algorithm. As discussed by some of the answers it works by
1) creating a table of summed frequencies
2) generates a random integer
3) rounds it with ceiling function
4) finds the "summed" range within which the random number falls and outputs original array entity based on it
Inverse Transform
In probability speak, a cumulative distribution function F(x) returns the probability that any randomly drawn value, call it X, is <= some given value x. For instance, if I did F(4) in this case, I would get .6. because the running sum of probabilities in your example is {.2, .4, .5, .55, .6, .65, ....}. I.e. the probability of randomly getting a value less than or equal to 4 is .6. However, what I actually want to know is the inverse of the cumulative probability function, call it F_inv. I want to know what is the x value given the cumulative probability. I want to pass in F_inv(.6) and get back 4. That is why this is called the inverse transform method.
So, in the inverse transform method, we are basically trying to find the interval in the cumulative distribution in which a random Uniform (0,1) number falls. This works out to the algorithm that perreal and icepack posted. Here is another way to state it in terms of the cumulative distribution function
Generate a random number U
for x in A .. B
if U <= F(x) then return x
Note that it might be more efficient to have the loop go from B to A and check if U >= F(x) if the smaller probabilities come at the beginning of the distribution

Resources