Uniformly distribute N items into C bins - algorithm

I have a number of items N and I want to uniformly distribute them among a number of C bins. My first though was to generate a random double number between 0 and 1 and then multiply it with the number N but it's not working as i expected. We are currently working on a Java project but a general algorithm would be fine.
Bins have no specific capacity and numbers don't have weights

You have not specified what you mean by "uniformly distribute ".
There are M=CN variants of distribution of N items into C bins. So you can random integer in range 0..M-1 and represent it in C-ary numeral system to get random combination.

Given that all items and bins are identical we can use the following simple approach, which is definitely not the most efficient way to go but it is easy and works.
Create a vector containing the sequence 1 to N and use a function to randomly shuffle the values (e.g., Collections.shuffle(values)). Then the first N/C items are placed in the first bin, the following N/C items in the second one, etc..
Example, we have N=10 items and C=2 bins. We create the vector val = {1,2,3,4,5,6,7,8,9,10} and using a random shuffle function gives val = {4,8,2,1,9,10,5,3,6,7}. Then use this to get the following two bins
bin1: {4,8,2,1,9} and bin2: {10,5,3,6,7}

Related

How can I generate a random sparse matrix with a specific probability of symmetric entries?

I'm working on a program that sorts individuals into teams based on a sparse matrix with binary entries, each entry corresponding to whether or not i is willing to work with j and so on. I have the program running, but I need to be able to test it on random matrices to observe some relationships between the results and the parameters.
What I'd like to find is some way to generate a matrix that has a a certain number of non-zero entries per row and a certain probability of symmetrical entries. That is, I want to be able to assign a specific number for P(w_ji = 1 | w_ij = 1) and use that to generate a matrix. I don't want symmetric matrices, but modeling this with completely random matrices would be inaccurate, since a real-world willingness matrix tends to be at least somewhat symmetric.
Does anyone know of anything I could use to generate such a matrix? I generally use python (with gurobi) and am open to installing any number of other libraries to help if I have to. If anyone else here uses gurobi, I would appreciate input on whether or not I could model matrix generation like this as an optimization problem using something like this for an objective function:
min <= sum(w[i,j] * w[j,i] for i in... for j in...) <= max
Thank you!
If all you want is a coefficient matrix with random distribution of 0 and 1 values, the easiest option is to pick a probability and do Bernoulli trials as to whether the value is 1. (If it is zero, omit the element for sparseness).
Alternately, if you need a random permutation of a fixed number of 0's and 1's, then try something like:
import random
n = 50
k = 10
positions = sorted(random.sample(range(n), k))
The list positions represents the nonzero elements you need.
With a matrix representation, this would be a good candidate for the Gurobi matrix variable object, MVar.

How to pick two random numbers with no carry when adding

I'm trying to make this algorithm which inputs a lower and upper limit for two numbers (the two numbers may have different lower and upper limits) and outputs two random numbers within that range
The catch is however that when the two numbers are added, no "carry" should be there. This means the sum of the digits in each place should be no more than 9.
How can I make sure that the numbers are truly random and that no carrying occurs when adding the two numbers
Thanks a lot!
Edit: The ranges can vary, the widest range can be 0 to 999. Also, I'm using VBA (Excel)
An easy and distributionally correct way of doing this is to use Rejection Sampling, a.k.a. "Acceptance/Rejection". Generate the values independently, and if the carry constraint is violated repeat. In pseudocode
do {
generate x, y
} while (x + y > threshold)
The number of times the loop will iterate has a geometric distribution with an expected value of (proportion of sums below the threshold)-1. For example, if you're below the threshold 90% of the time then the long term number of iterations will average out to 10/9, 1.11... iterations per pair generated. For lower likelihoods of acceptance, it will take more attempts on average.

How do you obtain the set corresponding to a number in O(1)

Lets say there are N numbers grouped into K disjoint sets. The problem is to create a key for each of these disjoint sets such that given any number, a simple operation on these keys and the number should be able to give the set containing the number.
A simple approach and it’s limitations:
For ex. Let the N numbers be 34,35,36….321 Let set 1 be made of 63,66,77,89,122,222 And set 2 be made of 53,69,137,230,280,299,300,306 And so on..
sol:
first an array of prime numbers containing (321-34=287) items is created. To create a key for the first set, the prime numbers corresponding to positions (63-34),(66-34),…(222-34) in the array are multiplied. Now this key is divisible only by prime numbers corresponding to the numbers in set 1 and not otherwise. So, given 77, [if(key1%(primeArray[77-34]==0)], 77 belongs to set1
But since I’m dealing with large number of data values, the keys cannot be represented by even 64 bit integers(and I don’t want to split the keys). Is there a better way of doing this?
Seems like a classic case to use union find data structure.
Union-find data structure
Give each number a rank. Trivially, the number itself in this case. The highest rank in a set represents the set.

Randomly choosing from a list with weighted probabilities

I have an array of N elements (representing the N letters of a given alphabet), and each cell of the array holds an integer value, that integer value meaning the number of occurrences in a given text of that letter. Now I want to randomly choose a letter from all of the letters in the alphabet, based on his number of appearances with the given constraints:
If the letter has a positive (nonzero) value, then it can be always chosen by the algorithm (with a bigger or smaller probability, of course).
If a letter A has a higher value than a letter B, then it has to be more likely to be chosen by the algorithm.
Now, taking that into account, I've come up with a simple algorithm that might do the job, but I was just wondering if there was a better thing to do. This seems to be quite fundamental, and I think there might be more clever things to do in order to accomplish this more efficiently. This is the algorithm i thought:
Add up all the frequencies in the array. Store it in SUM
Choosing up a random value from 0 to SUM. Store it in RAN
[While] RAN > 0, Starting from the first, visit each cell in the array (in order), and subtract the value of that cell from RAN
The last visited cell is the chosen one
So, is there a better thing to do than this? Am I missing something?
I'm aware most modern computers can compute this so fast I won't even notice if my algorithm is inefficient, so this is more of a theoretical question rather than a practical one.
I prefer an explained algorithm rather than just code for an answer, but If you're more comfortable providing your answer in code, I have no problem with that.
The idea:
Iterate through all the elements and set the value of each element as the cumulative frequency thus far.
Generate a random number between 1 and the sum of all frequencies
Do a binary search on the values for this number (finding the first value greater than or equal to the number).
Example:
Element A B C D
Frequency 1 4 3 2
Cumulative 1 5 8 10
Generate a random number in the range 1-10 (1+4+3+2 = 10, the same as the last value in the cumulative list), do a binary search, which will return values as follows:
Number Element returned
1 A
2 B
3 B
4 B
5 B
6 C
7 C
8 C
9 D
10 D
The Alias Method has amortized O(1) time per value generated, but requires two uniforms per lookup. Basically, you create a table where each column contains one of the values to be generated, a second value called an alias, and a conditional probability of choosing between the value and its alias. Use your first uniform to pick any of the columns with equal likelihood. Then choose between the primary value and the alias based on your second uniform. It takes a O(n log n) work to initially set up a valid table for n values, but after the table's built generating values is constant time. You can download this Ruby gem to see an actual implementation.
Two other very fast methods by Marsaglia et al. are described here. They have provided C implementations.

Optimal algorithm for generating a random number R not in a set of numbers N

I am curious to know what the best way to generate a random integer R that is not in a provided set of integers (R∉N). I can think of several ways of doing this but I'm wondering what you all think.
Let N be the size of the overall set, and let K be the size of the excluded set.
I depends on the size of the set you are sampling from. If the excluded set is much smaller than the overall range, just choose a random number, and if it is in the excluded set, choose again. If we keep the excluded set in a hash table each try can be done in O(1) time.
If the excluded set is large, choose a random number R in a set of size (N - K) and output the choice as the member of the non excluded elements. If we store just the holes in a hash table keyed with the value of the random number we can generate this in one sample in time O(1).
The cutoff point will depend on the size of (N - K)/N, but I suspect that unless this is greater than .5 or so, or you sets are very small, just sampling until you get a hit will be faster in practice.
Given your limited description? Find the maximum value of the elements in N. Generate only random numbers greater than that.
Generate a random number R in the entire domain (subtract the size of N from the max value) that you want to use. Then loop through all N less than R and for each add 1 to R. This will give a random number in the domain that is not in N.

Resources