Initialization of Weighted Reservoir Sampling (A-Chao implementation)

Initialization of Weighted Reservoir Sampling (A-Chao implementation) - algorithm

I am trying to implement A-Chao version of weighted reservoir sampling as shown in https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_A-Chao
But I found that the pseudo-code described in wiki seems to be wrong, especially on the initialization part. I read the paper, it mentions we need to handle over-weighted data points, but I still cannot get the idea how to initialize correctly.
In my understanding, on initialization step, we want to make sure all initial data points chosen should have same probability*weight to be chosen. However, I don't understand how the over-weighted points is related with that.
Code I implemented according to the wiki, but the results show it is incorrect.
const reservoirSampling = <T>(dataList: T[], k: number, getWeight: (point: T) => number): T[] => {
const sampledList = dataList.slice(0, k);
let currentWeightSum: number = sampledList.reduce((sum, item) => sum + getWeight(item), 0);
for (let i = k; i < dataList.length; i++) {
const currentItem = dataList[i];
currentWeightSum += getWeight(currentItem);
const probOfChoosingCurrentItem = getWeight(currentItem) / currentWeightSum;
const rand = Math.random();
if (rand <= probOfChoosingCurrentItem) {
sampledList[getRandomInt(0, k - 1)] = currentItem;
}
}
return sampledList;
};

The best way to get the distribution that Chao's algorithm produces is to implement VarOptk sampling as in the pseudocode labeled Algorithm 1 from the paper that introduced VarOptk sampling by Cohen et al.
That's an arXiv link and hence very stable, but to summarize, the idea is to separate the items into "heavy" (weight high enough to guarantee inclusion in the sample so far) and "light" (the others). Keep the heavy items in a priority queue where it is easy to remove the lightest of them. When a new item comes in, we have to determine whether it is heavy or light, and which heavy items became light (if any). Then there's a sampling procedure for dropping an item that treats the heavy → light items specially using weighted sampling and then falls back to choosing a uniform random light item (as in the easy case of Chao's algorithm).
The one trick with the pseudocode is that, if you use floating-point arithmetic, you have to be a little careful about "impossible" cases. Post your finished code on Code Review and ping me here if you would like feedback.

You will find a python implementation of Chao's strategy below. Here is a plot of 10000 samples from 0,..,99 with weights indicated by the yellow lines. The y-coordinate denotes how many times a given item was sampled.
I first implemented the pseudocode on Wikipedia, and agree completely with the OP that it is dead wrong. It then took me more than a day to understand Chao's paper. I also found the section of Tillé's book on Chao's method (see Algorithm 6.14 on page 120) helpful. (I don't know what the OP means by with the issues with initialization.)
Disclaimer: I am new to python, and just tried to do my best. I think posting code might be more helpful than posting pseudocode. (Mainly I want to save someone a day's work getting to the bottom of Chao's paper!) If you do end up using this, I'd appreciate any feedback. Standard health warnings apply!
First, Chao's computation of inclusion probabilities:
import numpy as np
import random
def compute_Chao_probs(weights, total_weight, sample_size):
"""
Consider a weighted population, some of its members, and their weights.
This function returns a list of probabilities that these members are selected
in a weighted sample of sample_size members of the population.
Example 1: If all weights are equal, this probability is sample_size /(size of population).
Example 2: If the size of our population is sample_size then these probabilities are all 1.
Naively we expect these probabilities to be given by sample_size*weight/total_weight, however
this may lead to a probability greater than 1. For example, consider a population
of 3 with weights [3,1,1], and suppose we want to select 2 elements. The naive
probability of selecting the first element is 2*3/5 > 1.
We follow Chao's description: compute naive guess, set any probs which are bigger
than 1 to 1, rinse and repeat.
We expect to call this routine many times, so we avoid for loops, and try to make numpy do the work.
"""
assert all(w > 0 for w in weights), "weights must be strictly positive."
# heavy_items is a True / False array of length sample_size.
# True indicates items deemed "heavy" (i.e. assigned probability 1)
# At the outset, no items are heavy:
heavy_items = np.zeros(len(weights),dtype=bool)
while True:
new_probs = (sample_size - np.sum(heavy_items))/(total_weight - np.sum(heavy_items*weights))*weights
valid_probs = np.less_equal(np.logical_not(heavy_items) * new_probs, np.ones((len(weights))))
if all(valid_probs): # we are done
return np.logical_not(heavy_items)*new_probs + heavy_items
else: # we need to declare some more items heavy
heavy_items = np.logical_or(heavy_items, np.logical_not(valid_probs))
Then Chao's rejection rule:
def update_sample(current_sample, new_item, new_weight):
"""
We have a weighted population, from which we have selected n items.
We know their weights, the total_weight of the population, and the
probability of their inclusion in the sample when we selected them.
Now new_item arrives, with a new_weight. Should we take it or not?
current_sample is a dictionary, with keys 'items', 'weights', 'probs'
and 'total_weight'. This function updates current_sample according to
Chao's recipe.
"""
items = current_sample['items']
weights = current_sample['weights']
probs = current_sample['probs']
total_weight = current_sample['total_weight']
assert len(items) == len(weights) and len(weights) == len(probs)
fixed_sample_size = len(weights)
total_weight = total_weight + new_weight
new_Chao_probs = compute_Chao_probs(np.hstack((weights,[new_weight])),total_weight,fixed_sample_size)
if random.random() <= new_Chao_probs[-1]: # we should take new_item
#
# Now we need to decide which element should be replaced.
# Fix an index i in items, and let P denote probability. We have:
# P(i is selected in previous step) = probs[i]
# P(i is selected at current step) = new_Chao_probs[i]
# Hence (by law of conditional probability)
# P(i is selected at current step | i is selected at previous step) = new_Chao_probs[i] / probs[i]
# Thus:
# P(i is not selected at current step | i is selected at previous step) = 1 - new_Chao_probs[i] / probs[i]
# Now is we condition this on the assumption that the new element is taken, we get
# 1/new_Chao_probs[-1]*(1 - new_Chao_probs[i] / probs[i]).
#
# (*I think* this is what Chao is talking about in the two paragraphs just before Section 3 in his paper.)
rejection_weights = 1/new_Chao_probs[-1]*(np.ones((fixed_sample_size)) - (new_Chao_probs[0:-1]/probs))
# assert np.isclose(np.sum(rejection_weights),1)
# In examples we see that np.sum(rejection_weights) is not necessarily 1.
# I am a little confused by this, but ignore it for the moment.
rejected_index = random.choices(range(fixed_sample_size), rejection_weights)[0]
#make the changes:
current_sample['items'][rejected_index] = new_item
current_sample['weights'][rejected_index] = new_weight
current_sample['probs'] = new_Chao_probs[0:-1]
current_sample['probs'][rejected_index] = new_Chao_probs[-1]
current_sample['total_weight'] = total_weight
Finally, code to test and plot:
# Now we test Chao on some different distributions.
#
# This also illustrates how to use update_sample.
#
from collections import Counter
import matplotlib.pyplot as plt
n = 10 # number of samples
items_in = list(range(100))
weights_in = [random.random() for _ in range(10)]
# other possible tests:
weights_in = [i+1 for i in range(10)] # staircase
#weights_in = [9-i+1 for i in range(10)] # upside down staircase
#weights_in = [(i+1)**2 for i in range(10)] # parabola
#weights_in = [10**i for i in range(10)] # a very heavy tailed distribution (to check numerical stability)
random.shuffle(weights_in) # sometimes it is fun to shuffle
weights_in = np.array([w for w in weights_in for _ in range(10)])
count = Counter({})
for j in range(10000):
# we take the first n with probability 1:
current_sample = {}
current_sample['items'] = items_in[:n]
current_sample['weights'] = np.array(weights_in[:n])
current_sample['probs'] = np.ones((n))
current_sample['total_weight'] = np.sum(current_sample['weights'])
for i in range(n,len(items_in)):
update_sample(current_sample, items_in[i], weights_in[i])
count.update(current_sample['items'])
plt.figure(figsize=(20,10))
plt.plot(100000*np.array(weights_in)/np.sum(weights_in), 'yo')
plt.plot(list(count.keys()), list(count.values()), 'ro')
plt.show()

Related

Conditional sampling of binary vectors (?)

I'm trying to find a name for my problem, so I don't have to re-invent wheel when coding an algorithm which solves it...
I have say 2,000 binary (row) vectors and I need to pick 500 from them. In the picked sample I do column sums and I want my sample to be as close as possible to a pre-defined distribution of the column sums. I'll be working with 20 to 60 columns.
A tiny example:
Out of the vectors:
110
010
011
110
100
I need to pick 2 to get column sums 2, 1, 0. The solution (exact in this case) would be
110
100
My ideas so far
one could maybe call this a binary multidimensional knapsack, but I did not find any algos for that
Linear Programming could help, but I'd need some step by step explanation as I got no experience with it
as exact solution is not always feasible, something like simulated annealing brute force could work well
a hacky way using constraint solvers comes to mind - first set the constraints tight and gradually loosen them until some solution is found - given that CSP should be much faster than ILP...?

My concrete, practical (if the approximation guarantee works out for you) suggestion would be to apply the maximum entropy method (in Chapter 7 of Boyd and Vandenberghe's book Convex Optimization; you can probably find several implementations with your favorite search engine) to find the maximum entropy probability distribution on row indexes such that (1) no row index is more likely than 1/500 (2) the expected value of the row vector chosen is 1/500th of the predefined distribution. Given this distribution, choose each row independently with probability 500 times its distribution likelihood, which will give you 500 rows on average. If you need exactly 500, repeat until you get exactly 500 (shouldn't take too many tries due to concentration bounds).

Firstly I will make some assumptions regarding this problem:
Regardless whether the column sum of the selected solution is over or under the target, it weighs the same.
The sum of the first, second, and third column are equally weighted in the solution (i.e. If there's a solution whereas the first column sum is off by 1, and another where the third column sum is off by 1, the solution are equally good).
The closest problem I can think of this problem is the Subset sum problem, which itself can be thought of a special case of Knapsack problem.
However both of these problem are NP-Complete. This means there are no polynomial time algorithm that can solve them, even though it is easy to verify the solution.
If I were you the two most arguably efficient solution of this problem are linear programming and machine learning.
Depending on how many columns you are optimising in this problem, with linear programming you can control how much finely tuned you want the solution, in exchange of time. You should read up on this, because this is fairly simple and efficient.
With Machine learning, you need a lot of data sets (the set of vectors and the set of solutions). You don't even need to specify what you want, a lot of machine learning algorithms can generally deduce what you want them to optimise based on your data set.
Both solution has pros and cons, you should decide which one to use yourself based on the circumstances and problem set.

This definitely can be modeled as (integer!) linear program (many problems can). Once you have it, you can use a program such as lpsolve to solve it.
We model vector i is selected as x_i which can be 0 or 1.
Then for each column c, we have a constraint:
sum of all (x_i * value of i in column c) = target for column c
Taking your example, in lp_solve this could look like:
min: ;
+x1 +x4 +x5 >= 2;
+x1 +x4 +x5 <= 2;
+x1 +x2 +x3 +x4 <= 1;
+x1 +x2 +x3 +x4 >= 1;
+x3 <= 0;
+x3 >= 0;
bin x1, x2, x3, x4, x5;

If you are fine with a heuristic based search approach, here is one.
Go over the list and find the minimum squared sum of the digit wise difference between each bit string and the goal. For example, if we are looking for 2, 1, 0, and we are scoring 0, 1, 0, we would do it in the following way:
Take the digit wise difference:
2, 0, 1
Square the digit wise difference:
4, 0, 1
Sum:
5
As a side note, squaring the difference when scoring is a common method when doing heuristic search. In your case, it makes sense because bit strings that have a 1 in as the first digit are a lot more interesting to us. In your case this simple algorithm would pick first 110, then 100, which would is the best solution.
In any case, there are some optimizations that could be made to this, I will post them here if this kind of approach is what you are looking for, but this is the core of the algorithm.

You have a given target binary vector. You want to select M vectors out of N that have the closest sum to the target. Let's say you use the eucilidean distance to measure if a selection is better than another.
If you want an exact sum, have a look at the k-sum problem which is a generalization of the 3SUM problem. The problem is harder than the subset sum problem, because you want an exact number of elements to add to a target value. There is a solution in O(N^(M/2)). lg N), but that means more than 2000^250 * 7.6 > 10^826 operations in your case (in the favorable case where vectors operations have a cost of 1).
First conclusion: do not try to get an exact result unless your vectors have some characteristics that may reduce the complexity.
Here's a hill climbing approach:
sort the vectors by number of 1's: 111... first, 000... last;
use the polynomial time approximate algorithm for the subset sum;
you have an approximate solution with K elements. Because of the order of elements (the big ones come first), K should be a little as possible:
if K >= M, you take the M first vectors of the solution and that's probably near the best you can do.
if K < M, you can remove the first vector and try to replace it with 2 or more vectors from the rest of the N vectors, using the same technique, until you have M vectors. To sumarize: split the big vectors into smaller ones until you reach the correct number of vectors.
Here's a proof of concept with numbers, in Python:
import random
def distance(x, y):
return abs(x-y)
def show(ls):
if len(ls) < 10:
return str(ls)
else:
return ", ".join(map(str, ls[:5]+("...",)+ls[-5:]))
def find(is_xs, target):
# see https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution
S = [(0, ())] # we store indices along with values to get the path
for i, x in is_xs:
T = [(x + t, js + (i,)) for t, js in S]
U = sorted(S + T)
y, ks = U[0]
S = [(y, ks)]
for z, ls in U:
if z == target: # use the euclidean distance here if you want an approximation
return ls
if z != y and z < target:
y, ks = z, ls
S.append((z, ls))
ls = S[-1][1] # take the closest element to target
return ls
N = 2000
M = 500
target = 1000
xs = [random.randint(0, 10) for _ in range(N)]
print ("Take {} numbers out of {} to make a sum of {}", M, xs, target)
xs = sorted(xs, reverse = True)
is_xs = list(enumerate(xs))
print ("Sorted numbers: {}".format(show(tuple(is_xs))))
ls = find(is_xs, target)
print("FIRST TRY: {} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
splits = 0
while len(ls) < M:
first_x = xs[ls[0]]
js_ys = [(i, x) for i, x in is_xs if i not in ls and x != first_x]
replace = find(js_ys, first_x)
splits += 1
if len(replace) < 2 or len(replace) + len(ls) - 1 > M or sum(xs[i] for i in replace) != first_x:
print("Give up: can't replace {}.\nAdd the lowest elements.")
ls += tuple([i for i, x in is_xs if i not in ls][len(ls)-M:])
break
print ("Replace {} (={}) by {} (={})".format(ls[:1], first_x, replace, sum(xs[i] for i in replace)))
ls = tuple(sorted(ls[1:] + replace)) # use a heap?
print("{} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
print("AFTER {} splits, {} -> {}".format(splits, ls, sum(x for i, x in is_xs if i in ls)))
The result is obviously not guaranteed to be optimal.
Remarks:
Complexity: find has a polynomial time complexity (see the Wikipedia page) and is called at most M^2 times, hence the complexity remains polynomial. In practice, the process is reasonably fast (split calls have a small target).
Vectors: to ensure that you reach the target with the minimum of elements, you can improve the order of element. Your target is (t_1, ..., t_c): if you sort the t_js from max to min, you get the more importants columns first. You can sort the vectors: by number of 1s and then by the presence of a 1 in the most important columns. E.g. target = 4 8 6 => 1 1 1 > 0 1 1 > 1 1 0 > 1 0 1 > 0 1 0 > 0 0 1 > 1 0 0 > 0 0 0.
find (Vectors) if the current sum exceed the target in all the columns, then you're not connecting to the target (any vector you add to the current sum will bring you farther from the target): don't add the sum to S (z >= target case for numbers).

I propose a simple ad hoc algorithm, which, broadly speaking, is a kind of gradient descent algorithm. It seems to work relatively well for input vectors which have a distribution of 1s “similar” to the target sum vector, and probably also for all “nice” input vectors, as defined in a comment of yours. The solution is not exact, but the approximation seems good.
The distance between the sum vector of the output vectors and the target vector is taken to be Euclidean. To minimize it means minimizing the sum of the square differences off sum vector and target vector (the square root is not needed because it is monotonic). The algorithm does not guarantee to yield the sample that minimizes the distance from the target, but anyway makes a serious attempt at doing so, by always moving in some locally optimal direction.
The algorithm can be split into 3 parts.
First of all the first M candidate output vectors out of the N input vectors (e.g., N=2000, M=500) are put in a list, and the remaining vectors are put in another.
Then "approximately optimal" swaps between vectors in the two lists are done, until either the distance would not decrease any more, or a predefined maximum number of iterations is reached. An approximately optimal swap is one where removing the first vector from the list of output vectors causes a maximal decrease or minimal increase of the distance, and then, after the removal of the first vector, adding the second vector to the same list causes a maximal decrease of the distance. The whole swap is avoided if the net result is not a decrease of the distance.
Then, as a last phase, "optimal" swaps are done, again stopping on no decrease in distance or maximum number of iterations reached. Optimal swaps cause a maximal decrease of the distance, without requiring the removal of the first vector to be optimal in itself. To find an optimal swap all vector pairs have to be checked. This phase is much more expensive, being O(M(N-M)), while the previous "approximate" phase is O(M+(N-M))=O(N). Luckily, when entering this phase, most of the work has already been done by the previous phase.
from typing import List, Tuple
def get_sample(vects: List[Tuple[int]], target: Tuple[int], n_out: int,
max_approx_swaps: int = None, max_optimal_swaps: int = None,
verbose: bool = False) -> List[Tuple[int]]:
"""
Get a sample of the input vectors having a sum close to the target vector.
Closeness is measured in Euclidean metrics. The output is not guaranteed to be
optimal (minimum square distance from target), but a serious attempt is made.
The max_* parameters can be used to avoid too long execution times,
tune them to your needs by setting verbose to True, or leave them None (∞).
:param vects: the list of vectors (tuples) with the same number of "columns"
:param target: the target vector, with the same number of "columns"
:param n_out: the requested sample size
:param max_approx_swaps: the max number of approximately optimal vector swaps,
None means unlimited (default: None)
:param max_optimal_swaps: the max number of optimal vector swaps,
None means unlimited (default: None)
:param verbose: print some info if True (default: False)
:return: the sample of n_out vectors having a sum close to the target vector
"""
def square_distance(v1, v2):
return sum((e1 - e2) ** 2 for e1, e2 in zip(v1, v2))
n_vec = len(vects)
assert n_vec > 0
assert n_out > 0
n_rem = n_vec - n_out
assert n_rem > 0
output = vects[:n_out]
remain = vects[n_out:]
n_col = len(vects[0])
assert n_col == len(target) > 0
sumvect = (0,) * n_col
for outvect in output:
sumvect = tuple(map(int.__add__, sumvect, outvect))
sqdist = square_distance(sumvect, target)
if verbose:
print(f"sqdist = {sqdist:4} after"
f" picking the first {n_out} vectors out of {n_vec}")
if max_approx_swaps is None:
max_approx_swaps = sqdist
n_approx_swaps = 0
while sqdist and n_approx_swaps < max_approx_swaps:
# find the best vect to subtract (the square distance MAY increase)
sqdist_0 = None
index_0 = None
sumvect_0 = None
for index in range(n_out):
tmp_sumvect = tuple(map(int.__sub__, sumvect, output[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_0 is None or sqdist_0 > tmp_sqdist:
sqdist_0 = tmp_sqdist
index_0 = index
sumvect_0 = tmp_sumvect
# find the best vect to add,
# but only if there is a net decrease of the square distance
sqdist_1 = sqdist
index_1 = None
sumvect_1 = None
for index in range(n_rem):
tmp_sumvect = tuple(map(int.__add__, sumvect_0, remain[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_1 > tmp_sqdist:
sqdist_1 = tmp_sqdist
index_1 = index
sumvect_1 = tmp_sumvect
if sumvect_1:
tmp = output[index_0]
output[index_0] = remain[index_1]
remain[index_1] = tmp
sqdist = sqdist_1
sumvect = sumvect_1
n_approx_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_approx_swaps}"
f" approximately optimal swap{'s'[n_approx_swaps == 1:]}")
diffvect = tuple(map(int.__sub__, sumvect, target))
if max_optimal_swaps is None:
max_optimal_swaps = sqdist
n_optimal_swaps = 0
while sqdist and n_optimal_swaps < max_optimal_swaps:
# find the best pair to swap,
# but only if the square distance decreases
best_sqdist = sqdist
best_diffvect = diffvect
best_pair = None
for i0 in range(M):
tmp_diffvect = tuple(map(int.__sub__, diffvect, output[i0]))
for i1 in range(n_rem):
new_diffvect = tuple(map(int.__add__, tmp_diffvect, remain[i1]))
new_sqdist = sum(d * d for d in new_diffvect)
if best_sqdist > new_sqdist:
best_sqdist = new_sqdist
best_diffvect = new_diffvect
best_pair = (i0, i1)
if best_pair:
tmp = output[best_pair[0]]
output[best_pair[0]] = remain[best_pair[1]]
remain[best_pair[1]] = tmp
sqdist = best_sqdist
diffvect = best_diffvect
n_optimal_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_optimal_swaps}"
f" optimal swap{'s'[n_optimal_swaps == 1:]}")
return output
from random import randrange
C = 30 # number of columns
N = 2000 # total number of vectors
M = 500 # number of output vectors
F = 0.9 # fill factor of the target sum vector
T = int(M * F) # maximum value + 1 that can be appear in the target sum vector
A = 10000 # maximum number of approximately optimal swaps, may be None (∞)
B = 10 # maximum number of optimal swaps, may be None (unlimited)
target = tuple(randrange(T) for _ in range(C))
vects = [tuple(int(randrange(M) < t) for t in target) for _ in range(N)]
sample = get_sample(vects, target, M, A, B, True)
Typical output:
sqdist = 2639 after picking the first 500 vectors out of 2000
sqdist = 9 after 27 approximately optimal swaps
sqdist = 1 after 4 optimal swaps
P.S.: As it stands, this algorithm is not limited to binary input vectors, integer vectors would work too. Intuitively I suspect that the quality of the optimization could suffer, though. I suspect that this algorithm is more appropriate for binary vectors.
P.P.S.: Execution times with your kind of data are probably acceptable with standard CPython, but get better (like a couple of seconds, almost a factor of 10) with PyPy. To handle bigger sets of data, the algorithm would have to be translated to C or some other language, which should not be difficult at all.

Optimal filling of grid figure with squares

recently I have designed a puzzle for children to solve. However I would like to now the optimal solution.
The problem is as follows: You have this figure made up off small squares
You have to fill it in with larger squares and it is scored with the following table:
| Square Size | 1x1 | 2x2 | 3x3 | 4x4 | 5x5 | 6x6 | 7x7 | 8x8 |
|-------------|-----|-----|-----|-----|-----|-----|-----|-----|
| Points | 0 | 4 | 10 | 20 | 35 | 60 | 84 | 120 |
There are simply to many possible solutions to check them all. Some other people suggested dynamic programming, but I don't know how to divide the figure in smaller ones which put together have the same optimal solution.
I would like to find a way to find the optimal solutions to these kinds of problems in reasonable time (like a couple of days max on a regular desktop). The highest score found so far with a guessing algorithm and some manual work is 1112.
Solutions to similar problems with combining sub-problems are also appreciated.
I don't need all the code written out. An outline or idea for an algorithm would be enough.
Note: The biggest square that can fit is 8x8 so scores for bigger squares are not included.
[[1,1,0,0,0,1,0,0,0,0,0,0,1,1,1,1,1,1,0,0,1,1,1,1,1,0,0,1,1,1],
[1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,1],
[1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1],
[0,0,0,1,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1],
[0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1],
[1,0,0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1],
[1,1,0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,0,0,1,0,0,0,0,1],
[1,1,1,0,0,0,0,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0],
[0,1,1,1,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,0,0,0],
[0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0],
[0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0],
[0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,1],
[0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1],
[0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0],
[0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0],
[0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],
[0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],
[0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1],
[0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1],
[0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1],
[0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1],
[0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1],
[0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1],
[0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1],
[1,1,1,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1],
[1,1,1,0,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1],
[1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,1,1,1],
[1,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,1,1,1],
[1,0,0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,1],
[1,0,0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1,1],
[0,0,0,0,0,1,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,1,1,1],
[0,0,0,0,0,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1],
[0,0,0,0,0,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1]];

Here is a quite general prototype using Mixed-integer-programming which solves your instance optimally (i obtained the value of 1112 like you deduced yourself) and might solve others too.
In general, your problem is np-complete and this makes it hard (there are some instances which will be trouble).
While i suspect that SAT-solver and CP-solver based approaches might be more powerful (because of the combinatoric nature; i even was surprised that MIP works here), the MIP-approach has also some advantages:
MIP-solvers are complete (as SAT and CP; but many random-based heuristics are not):
There are many commercial-grade solvers available if needed
The formulation is quite easy (especially compared to SAT; SAT-formulations will need advanced at most k out of n-formulations (for scoring-formulations) which are growing sub-quadratic (the naive approach grows exponentially)! They do exist, but are non-trivial)
The optimization-objective is just natural (SAT and CP would need iterative-refining = solve with some lower-bound; increment bound and re-solve)
MIP-solvers can also be quite powerful to obtain approximations of the optimal solution and also provide some proven bounds (e.g. optimum lower than x)
The following code is implemented in python using common scientific tools available (all of these are open-source). It allows setting the tile-range (e.g. adding 9x9 tiles) and different cost-functions. The comments should be enough to understand the ideas. It will use some (probably the best) open-source MIP-solver, but can also use commercial ones (outcommented line shows usage).
Code
import numpy as np
import itertools
from collections import defaultdict
import matplotlib.pyplot as plt # visualization only
import seaborn as sns # ""
from pulp import * # MIP-modelling & solver
""" INSTANCE """
instance = np.asarray([[1,1,0,0,0,1,0,0,0,0,0,0,1,1,1,1,1,1,0,0,1,1,1,1,1,0,0,1,1,1],
[1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,1],
[1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1],
[0,0,0,1,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1],
[0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1],
[1,0,0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1],
[1,1,0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,0,0,1,0,0,0,0,1],
[1,1,1,0,0,0,0,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0],
[0,1,1,1,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,0,0,0],
[0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0],
[0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0],
[0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,1],
[0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1],
[0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0],
[0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0],
[1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0],
[0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],
[0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],
[0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1],
[0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1],
[0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1],
[0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1],
[0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1],
[0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1],
[0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1],
[1,1,1,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1],
[1,1,1,0,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1],
[1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1],
[1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,1,1,1],
[1,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,1,1,1],
[1,0,0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,1],
[1,0,0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1,1],
[0,0,0,0,0,1,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,1,1,1],
[0,0,0,0,0,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1],
[0,0,0,0,0,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1]], dtype=bool)
def plot_compare(instance, solution, subgrids):
f, (ax1, ax2) = plt.subplots(2, sharex=True, sharey=True)
sns.heatmap(instance, ax=ax1, cbar=False, annot=True)
sns.heatmap(solution, ax=ax2, cbar=False, annot=True)
plt.show()
""" PARAMETERS """
SUBGRIDS = 8 # 1x1 - 8x8
SUGBRID_SCORES = {1:0, 2:4, 3:10, 4:20, 5:35, 6:60, 7:84, 8:120}
N, M = instance.shape # free / to-fill = zeros!
""" HELPER FUNCTIONS """
def get_square_covered_indices(instance, pos_x, pos_y, sg):
""" Calculate all covered tiles when given a top-left position & size
-> returns the base-index too! """
N, M = instance.shape
neighbor_indices = []
valid = True
for sX in range(sg):
for sY in range(sg):
if pos_x + sX < N:
if pos_y + sY < M:
if instance[pos_x + sX, pos_y + sY] == 0:
neighbor_indices.append((pos_x + sX, pos_y + sY))
else:
valid = False
break
else:
valid = False
break
else:
valid = False
break
return valid, neighbor_indices
def preprocessing(instance, SUBGRIDS):
""" Calculate all valid placement / tile-selection combinations """
placements = {}
index2placement = {}
placement2index = {}
placement2type = {}
type2placement = defaultdict(list)
cover2index = defaultdict(list) # cell covered by placement-index
index_gen = itertools.count()
for sg in range(1, SUBGRIDS+1): # sg = subgrid size
for pos_x in range(N):
for pos_y in range(M):
if instance[pos_x, pos_y] == 0: # free
feasible, covering = get_square_covered_indices(instance, pos_x, pos_y, sg)
if feasible:
new_index = next(index_gen)
placements[(sg, pos_x, pos_y)] = covering
index2placement[new_index] = (sg, pos_x, pos_y)
placement2index[(sg, pos_x, pos_y)] = new_index
placement2type[new_index] = sg
type2placement[sg].append(new_index)
cover2index[(pos_x, pos_y)].append(new_index)
return placements, index2placement, placement2index, placement2type, type2placement, cover2index
def calculate_collisions(placements, index2placement):
""" Calculate collisions between tile-placements (position + tile-selection)
-> only upper triangle is used: a < b! """
n_p = len(placements)
coll_mat = np.zeros((n_p, n_p), dtype=bool) # only upper triangle is used
for pA in range(n_p):
for pB in range(n_p):
if pA < pB:
covered_A = placements[index2placement[pA]]
covered_B = placements[index2placement[pB]]
if len(set(covered_A).intersection(set(covered_B))) > 0:
coll_mat[pA, pB] = True
return coll_mat
""" PREPROCESSING """
placements, index2placement, placement2index, placement2type, type2placement, cover2index = preprocessing(instance, SUBGRIDS)
N_P = len(placements)
coll_mat = calculate_collisions(placements, index2placement)
""" MIP-MODEL """
prob = LpProblem("GridFill", LpMaximize)
# Variables
X = np.empty(N_P, dtype=object)
for x in range(N_P):
X[x] = LpVariable('x'+str(x), 0, 1, cat='Binary')
# Objective
placement_scores = [SUGBRID_SCORES[index2placement[p][0]] for p in range(N_P)]
prob += lpDot(placement_scores, X), "Score"
# Constraints
# C1: Forbid collisions of placements
for a in range(N_P):
for b in range(N_P):
if a < b: # symmetry-reduction
if coll_mat[a, b]:
prob += X[a] + X[b] <= 1 # not both!
""" SOLVE """
print('solve')
#prob.solve(GUROBI()) # much faster commercial solver; if available
prob.solve(PULP_CBC_CMD(msg=1, presolve=True, cuts=True))
print("Status:", LpStatus[prob.status])
""" INTERPRET AND COMPLETE SOLUTION """
solution = np.zeros((N, M), dtype=int)
for x in range(N_P):
if X[x].value() > 0.99:
sg, pos_x, pos_y = index2placement[x]
_, positions = get_square_covered_indices(instance, pos_x, pos_y, sg)
for pos in positions:
solution[pos[0], pos[1]] = sg
fill_with_ones = np.logical_and((solution == 0), (instance == 0))
solution[fill_with_ones] = 1
""" VISUALIZE """
plot_compare(instance, solution, SUBGRIDS)
Assumptions / Nature of algorithm
There are no constraints describing the need for every free cell to be covered
This works when there are not negative scores
A positive score will be placed if it improves the objective
A zero-score (like your example) might keep some cells free, but these are proven to be 1's then (added after optimizing)
Performance
This is a good example of the discrepancy between open-source and commercial solvers. The two solvers tried were cbc and Gurobi.
cbc example output (just some final parts)
Result - Optimal solution found
Objective value: 1112.00000000
Enumerated nodes: 0
Total iterations: 307854
Time (CPU seconds): 2621.19
Time (Wallclock seconds): 2627.82
Option for printingOptions changed from normal to all
Total time (CPU seconds): 2621.57 (Wallclock seconds): 2628.24
Needed: ~45 mins
Gurobi example output
Explored 0 nodes (7004 simplex iterations) in 5.30 seconds
Thread count was 4 (of 4 available processors)
Optimal solution found (tolerance 1.00e-04)
Best objective 1.112000000000e+03, best bound 1.112000000000e+03, gap 0.0%
Needed: 6 seconds
General remarks about solver-performance
Gurobi should have much more functionality recognizing the nature of the problem and using appropriate hyper-parameters internally
I also think there are some SAT-based approaches used internally (as one of the core-developers wrote his dissertation mostly about combining these very different algorithmic techniques)
There are much better heuristics used, which could provide non-optimal solutions fast (which will help the steps after)
Example output: optimal solution with score 1112 (click to enlarge)

It is possible to reformulate problem into another NP-hard problem :-)
Create weighted graph where vertices are all possible squares that can be put on the board with weights regarding size, and edges are between intersecting squares. There is no need to represent squares 1x1 since there weight is zero.
E.g. for simple empty board 3x3, there are:
- 5 vertices: one 3x3 and four 2x2,
- 7 edges: four between 3x3 square and each 2x2 square, and six between each pair of 2x2 squares.
Now problem is to find maximum weight independent set.
I am not experienced with the topic, but from Wikipedia description it seems that there could exist fast enough algorithm. This graph is not in one of classes with known polynomial time algorithm, but it is quite close to P5-free graph. It seems to me that only possibility to have P5 in this graph is between 2x2 squares, which means to have stripe of width 2 of length 5. There is one in lower left corner. These regions can be covered (removed) before finding independent set with loosing none or very little to optimal solution.

(This is not meant to be a full answer; I'm just sharing what I'm working on so that we can collaborate.)
I think a good first step is to transform the binary grid by giving every cell the value of the maximum size of square that the cell can be the top-left corner of, like this:
0,0,3,2,1,0,3,2,2,2,2,1,0,0,0,0,0,0,2,1,0,0,0,0,0,2,1,0,0,0
0,0,2,2,2,3,3,2,1,1,1,1,0,0,0,3,3,3,3,3,3,2,1,0,0,1,2,1,0,0
0,2,1,1,1,2,3,2,1,0,0,0,0,3,2,2,2,2,2,2,3,3,2,1,0,0,3,2,1,0
3,2,1,0,0,1,3,2,1,0,0,0,3,2,2,1,1,1,1,1,2,3,3,2,1,0,2,2,2,1
3,3,2,1,0,0,2,2,2,1,0,3,2,2,1,1,0,0,0,0,1,2,4,3,2,2,1,1,1,1
2,3,3,2,1,0,2,1,1,1,2,3,2,1,1,0,0,0,0,0,0,1,3,3,2,1,1,0,0,0
1,2,3,4,3,2,1,1,0,0,1,3,2,1,0,0,0,0,0,0,0,0,2,2,2,1,0,0,0,0
0,1,2,3,3,2,1,0,0,0,0,2,2,1,0,0,0,0,0,0,0,0,2,1,1,2,2,2,1,0
0,0,1,2,3,2,1,0,0,0,0,1,2,1,0,0,0,0,0,0,0,0,2,1,0,1,1,2,1,0
0,0,0,1,2,2,1,0,0,0,0,0,2,1,0,0,0,0,0,0,0,2,1,1,0,0,0,3,2,1
1,0,0,0,1,2,1,0,0,0,0,0,4,3,2,1,0,0,0,4,3,2,1,0,0,0,0,2,2,1
2,1,0,0,0,1,2,1,0,0,5,5,4,4,4,4,4,4,4,5,5,4,3,2,1,0,0,1,2,1
3,2,1,0,0,0,1,6,6,5,4,4,4,3,3,3,3,3,3,4,4,5,4,3,2,1,0,0,1,1
3,2,1,0,0,0,0,6,5,5,4,3,3,3,2,2,2,2,2,3,3,4,5,4,3,2,1,0,0,0
3,2,2,2,2,7,6,6,5,4,4,3,2,2,2,1,1,1,1,2,2,3,5,5,4,3,2,1,0,0
2,2,1,1,1,7,6,5,5,4,3,3,2,1,1,1,0,0,0,1,1,2,4,6,5,4,3,2,1,0
2,1,1,0,0,7,6,5,4,4,3,2,2,1,0,0,0,0,0,0,0,1,3,6,5,4,3,2,1,0
1,1,0,0,8,7,6,5,4,3,3,2,1,1,0,0,0,0,0,0,0,0,2,7,6,5,4,3,2,1
1,0,0,0,8,7,6,5,4,3,2,2,1,0,0,0,0,0,0,0,0,0,1,7,6,5,4,3,2,1
0,0,0,7,8,7,6,5,4,3,2,1,1,0,0,0,0,0,0,0,0,0,0,6,6,5,4,3,2,1
0,0,0,6,8,7,6,5,4,3,2,1,0,0,0,0,0,0,0,0,0,0,0,6,5,5,4,3,2,1
0,0,0,5,7,7,6,5,4,3,2,1,0,0,0,0,0,0,0,0,0,0,0,6,5,4,4,3,2,1
0,0,0,4,6,7,7,6,5,4,3,2,1,0,0,0,0,0,0,0,0,0,6,5,5,4,3,3,2,1
0,0,0,3,5,6,7,7,6,5,4,3,2,1,0,0,0,0,0,0,0,6,6,5,4,4,3,2,2,1
1,0,0,2,4,5,6,7,8,7,6,5,4,3,2,1,0,0,0,7,6,6,5,5,4,3,3,2,1,1
1,0,0,1,3,4,5,6,7,7,8,8,8,8,8,8,7,7,6,6,6,5,5,4,4,3,2,2,1,0
2,1,0,0,2,3,4,5,6,6,7,7,8,7,7,7,7,6,6,5,5,5,4,4,3,3,2,1,1,0
2,1,0,0,1,2,3,4,5,5,6,6,8,7,6,6,6,6,5,5,4,4,4,3,3,2,2,1,0,0
3,2,1,0,0,1,2,3,4,4,5,5,8,7,6,5,5,5,5,4,4,3,3,3,2,2,1,1,0,0
3,2,1,0,0,0,1,2,3,3,4,4,8,7,6,5,4,4,4,4,3,3,2,2,2,1,1,0,0,0
4,3,2,1,0,0,0,1,2,2,3,3,8,7,6,5,4,3,3,3,3,2,2,1,1,1,0,0,0,0
3,3,2,1,0,0,0,0,1,1,2,2,8,7,6,5,4,3,2,2,2,2,1,1,0,0,0,0,0,0
2,2,2,2,1,0,0,0,0,0,1,1,8,7,6,5,4,3,2,1,1,1,1,0,0,0,0,0,0,0
1,1,1,2,1,0,0,0,0,0,0,0,8,7,6,5,4,3,2,1,0,0,0,0,0,0,0,0,0,0
0,0,0,2,1,0,0,0,0,0,0,0,8,8,7,6,5,4,3,2,1,0,0,0,0,0,0,0,0,0
0,0,0,2,1,0,0,0,0,0,0,6,8,7,7,6,6,5,4,3,2,1,0,0,0,0,0,0,0,0
0,0,0,2,2,2,3,3,3,3,3,5,7,7,6,6,5,5,4,3,3,3,3,2,1,0,0,0,0,0
0,0,3,2,1,1,3,2,2,2,2,4,6,6,6,5,5,4,4,3,2,2,2,2,1,0,0,0,0,0
0,0,3,2,1,0,3,2,1,1,1,3,5,5,5,5,4,4,3,3,2,1,1,2,1,0,0,0,0,0
0,0,3,2,1,0,3,2,1,0,0,2,4,4,4,4,4,3,3,2,2,1,0,2,1,0,0,0,0,0
0,4,3,2,1,0,3,2,1,0,0,1,3,3,3,4,3,3,2,2,1,1,0,2,1,0,0,0,0,0
0,4,3,2,1,0,3,2,1,0,0,0,2,2,2,3,3,2,2,1,1,0,0,2,1,0,0,0,0,0
0,4,3,2,1,0,3,2,1,0,0,0,1,1,1,2,2,2,1,1,0,0,0,2,1,0,0,0,0,0
3,3,3,2,1,0,3,2,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,3,2,1,0,0,0,0
2,2,2,2,1,0,2,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,1,0,0,0,0
1,1,1,1,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0
If you wanted to go through every option using brute force, you'd try every size of square that a cell could be the corner of (including 1x1), mark the square with zeros, adjust the values of the cells up to 7 places left/above the square, and recurse with the new grid.
If you iterated over the cells top-to-bottom and left-to-right, you'd only have to copy the grid starting from the current row to the bottom row, and you'd only have to adjust the values of cells up to 7 places to the left of the square.
The JS code I tested this with is fast for the top 2 or 3 rows of the grid (result: 24 and 44), takes 8 seconds to finish the top 4 rows (result: 70), and 30 minutes for 5 rows (result: 86). I'm not trying 6 rows.
But, as you can see from this grid, the number of possibilities is so huge that brute force will never be an option. On the other hand, trying something like adding large squares first, and then filling up the leftover space with smaller squares, is never going to guarantee the optimal result, I fear. It's too easy to come up with examples that would thwart such a strategy.
7,6,5,4,3,2,1,0,0,0,0,0,0,7,6,5,4,3,2,1
6,6,5,4,3,2,1,0,0,0,0,0,0,6,6,5,4,3,2,1
5,5,5,4,3,2,1,0,0,0,0,0,0,5,5,5,4,3,2,1
4,4,4,4,3,2,1,0,0,0,0,0,0,4,4,4,4,3,2,1
3,3,3,3,3,2,1,0,0,0,0,0,0,3,3,3,3,3,2,1
2,2,2,2,2,2,1,0,0,0,0,0,0,2,2,2,2,2,2,1
1,1,1,1,1,1,8,7,6,5,4,3,2,1,1,1,1,1,1,1
0,0,0,0,0,0,7,7,6,5,4,3,2,1,0,0,0,0,0,0
0,0,0,0,0,0,6,6,6,5,4,3,2,1,0,0,0,0,0,0
0,0,0,0,0,0,5,5,5,5,4,3,2,1,0,0,0,0,0,0
0,0,0,0,0,0,4,4,4,4,4,3,2,1,0,0,0,0,0,0
0,0,0,0,0,0,3,3,3,3,3,3,2,1,0,0,0,0,0,0
0,0,0,0,0,0,2,2,2,2,2,2,2,1,0,0,0,0,0,0
7,6,5,4,3,2,1,1,1,1,1,1,1,7,6,5,4,3,2,1
6,6,5,4,3,2,1,0,0,0,0,0,0,6,6,5,4,3,2,1
5,5,5,4,3,2,1,0,0,0,0,0,0,5,5,5,4,3,2,1
4,4,4,4,3,2,1,0,0,0,0,0,0,4,4,4,4,3,2,1
3,3,3,3,3,2,1,0,0,0,0,0,0,3,3,3,3,3,2,1
2,2,2,2,2,2,1,0,0,0,0,0,0,2,2,2,2,2,2,1
1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1
In the above example, putting an 8x8 square in the center and four 6x6 squares in the corners gives a lower score than putting a 6x6 square in the center and four 7x7 squares in the corners; so a greedy approach based on using the largest square possible will not give the optimal result.
This is how far I got by isolating zones connected by corridors of maximum width 3, and running the brute-force algorithm on the smaller grids. Where the border has no orange zone, adding another 2 cells doesn't increase the score of the isolated zone, so those cells can be used by the main zone unconditionally.

Parallelising gradient calculation in Julia

I was persuaded some time ago to drop my comfortable matlab programming and start programming in Julia. I have been working for a long with neural networks and I thought that, now with Julia, I could get things done faster by parallelising the calculation of the gradient.
The gradient need not be calculated on the entire dataset in one go; instead one can split the calculation. For instance, by splitting the dataset in parts, we can calculate a partial gradient on each part. The total gradient is then calculated by adding up the partial gradients.
Though, the principle is simple, when I parallelise with Julia I get a performance degradation, i.e. one process is faster then two processes! I am obviously doing something wrong... I have consulted other questions asked in the forum but I could still not piece together an answer. I think my problem lies in that there is a lot of unnecessary data moving going on, but I can't fix it properly.
In order to avoid posting messy neural network code, I am posting below a simpler example that replicates my problem in the setting of linear regression.
The code-block below creates some data for a linear regression problem. The code explains the constants, but X is the matrix containing the data inputs. We randomly create a weight vector w which when multiplied with X creates some targets Y.
######################################
## CREATE LINEAR REGRESSION PROBLEM ##
######################################
# This code implements a simple linear regression problem
MAXITER = 100 # number of iterations for simple gradient descent
N = 10000 # number of data items
D = 50 # dimension of data items
X = randn(N, D) # create random matrix of data, data items appear row-wise
Wtrue = randn(D,1) # create arbitrary weight matrix to generate targets
Y = X*Wtrue # generate targets
The next code-block below defines functions for measuring the fitness of our regression (i.e. the negative log-likelihood) and the gradient of the weight vector w:
####################################
## DEFINE FUNCTIONS ##
####################################
#everywhere begin
#-------------------------------------------------------------------
function negative_loglikelihood(Y,X,W)
#-------------------------------------------------------------------
# number of data items
N = size(X,1)
# accumulate here log-likelihood
ll = 0
for nn=1:N
ll = ll - 0.5*sum((Y[nn,:] - X[nn,:]*W).^2)
end
return ll
end
#-------------------------------------------------------------------
function negative_loglikelihood_grad(Y,X,W, first_index,last_index)
#-------------------------------------------------------------------
# number of data items
N = size(X,1)
# accumulate here gradient contributions by each data item
grad = zeros(similar(W))
for nn=first_index:last_index
grad = grad + X[nn,:]' * (Y[nn,:] - X[nn,:]*W)
end
return grad
end
end
Note that the above functions are on purpose not vectorised! I choose not to vectorise, as the final code (the neural network case) will also not admit any vectorisation (let us not get into more details regarding this).
Finally, the code-block below shows a very simple gradient descent that tries to recover the parameter weight vector w from the given data Y and X:
####################################
## SOLVE LINEAR REGRESSION ##
####################################
# start from random initial solution
W = randn(D,1)
# learning rate, set here to some arbitrary small constant
eta = 0.000001
# the following for-loop implements simple gradient descent
for iter=1:MAXITER
# get gradient
ref_array = Array(RemoteRef, nworkers())
# let each worker process part of matrix X
for index=1:length(workers())
# first index of subset of X that worker should work on
first_index = (index-1)*int(ceil(N/nworkers())) + 1
# last index of subset of X that worker should work on
last_index = min((index)*(int(ceil(N/nworkers()))), N)
ref_array[index] = #spawn negative_loglikelihood_grad(Y,X,W, first_index,last_index)
end
# gather the gradients calculated on parts of matrix X
grad = zeros(similar(W))
for index=1:length(workers())
grad = grad + fetch(ref_array[index])
end
# now that we have the gradient we can update parameters W
W = W + eta*grad;
# report progress, monitor optimisation
#printf("Iter %d neg_loglikel=%.4f\n",iter, negative_loglikelihood(Y,X,W))
end
As is hopefully visible, I tried to parallelise the calculation of the gradient in the easiest possible way here. My strategy is to break the calculation of the gradient in as many parts as available workers. Each worker is required to work only on part of matrix X, which part is specified by first_index and last_index. Hence, each worker should work with X[first_index:last_index,:]. For instance, for 4 workers and N = 10000, the work should be divided as follows:
worker 1 => first_index = 1, last_index = 2500
worker 2 => first_index = 2501, last_index = 5000
worker 3 => first_index = 5001, last_index = 7500
worker 4 => first_index = 7501, last_index = 10000
Unfortunately, this entire code works faster if I have only one worker. If add more workers via addprocs(), the code runs slower. One can aggravate this issue by create more data items, for instance use instead N=20000.
With more data items, the degradation is even more pronounced.
In my particular computing environment with N=20000 and one core, the code runs in ~9 secs. With N=20000 and 4 cores it takes ~18 secs!
I tried many many different things inspired by the questions and answers in this forum but unfortunately to no avail. I realise that the parallelisation is naive and that data movement must be the problem, but I have no idea how to do it properly. It seems that the documentation is also a bit scarce on this issue (as is the nice book by Ivo Balbaert).
I would appreciate your help as I have been stuck for quite some while with this and I really need it for my work. For anyone wanting to run the code, to save you the trouble of copying-pasting you can get the code here.
Thanks for taking the time to read this very lengthy question! Help me turn this into a model answer that anyone new in Julia can then consult!

I would say that GD is not a good candidate for parallelizing it using any of the proposed methods: either SharedArray or DistributedArray, or own implementation of distribution of chunks of data.
The problem does not lay in Julia, but in the GD algorithm.
Consider the code:
Main process:
for iter = 1:iterations #iterations: "the more the better"
δ = _gradient_descent_shared(X, y, θ)
θ = θ - α * (δ/N)
end
The problem is in the above for-loop which is a must. No matter how good _gradient_descent_shared is, the total number of iterations kills the noble concept of the parallelization.
After reading the question and the above suggestion I've started implementing GD using SharedArray. Please note, I'm not an expert in the field of SharedArrays.
The main process parts (simple implementation without regularization):
run_gradient_descent(X::SharedArray, y::SharedArray, θ::SharedArray, α, iterations) = begin
N = length(y)
for iter = 1:iterations
δ = _gradient_descent_shared(X, y, θ)
θ = θ - α * (δ/N)
end
θ
end
_gradient_descent_shared(X::SharedArray, y::SharedArray, θ::SharedArray, op=(+)) = begin
if size(X,1) <= length(procs(X))
return _gradient_descent_serial(X, y, θ)
else
rrefs = map(p -> (#spawnat p _gradient_descent_serial(X, y, θ)), procs(X))
return mapreduce(r -> fetch(r), op, rrefs)
end
end
The code common to all workers:
#= Returns the range of indices of a chunk for every worker on which it can work.
The function splits data examples (N rows into chunks),
not the parts of the particular example (features dimensionality remains intact).=#
#everywhere function _worker_range(S::SharedArray)
idx = indexpids(S)
if idx == 0
return 1:size(S,1), 1:size(S,2)
end
nchunks = length(procs(S))
splits = [round(Int, s) for s in linspace(0,size(S,1),nchunks+1)]
splits[idx]+1:splits[idx+1], 1:size(S,2)
end
#Computations on the chunk of the all data.
#everywhere _gradient_descent_serial(X::SharedArray, y::SharedArray, θ::SharedArray) = begin
prange = _worker_range(X)
pX = sdata(X[prange[1], prange[2]])
py = sdata(y[prange[1],:])
tempδ = pX' * (pX * sdata(θ) .- py)
end
The data loading and training. Let me assume that we have:
features in X::Array of the size (N,D), where N - number of examples, D-dimensionality of the features
labels in y::Array of the size (N,1)
The main code might look like this:
X=[ones(size(X,1)) X] #adding the artificial coordinate
N, D = size(X)
MAXITER = 500
α = 0.01
initialθ = SharedArray(Float64, (D,1))
sX = convert(SharedArray, X)
sy = convert(SharedArray, y)
X = nothing
y = nothing
gc()
finalθ = run_gradient_descent(sX, sy, initialθ, α, MAXITER);
After implementing this and run (on 8-cores of my Intell Clore i7) I got a very slight acceleration over serial GD (1-core) on my training multiclass (19 classes) training data (715 sec for serial GD / 665 sec for shared GD).
If my implementation is correct (please check this out - I'm counting on that) then parallelization of the GD algorithm is not worth of that. Definitely you might get better acceleration using stochastic GD on 1-core.

If you want to reduce the amount of data movement, you should strongly consider using SharedArrays. You could preallocate just one output vector, and pass it as an argument to each worker. Each worker sets a chunk of it, just as you suggested.

What data structure is conducive to discrete sampling? [duplicate]

Recently I needed to do weighted random selection of elements from a list, both with and without replacement. While there are well known and good algorithms for unweighted selection, and some for weighted selection without replacement (such as modifications of the resevoir algorithm), I couldn't find any good algorithms for weighted selection with replacement. I also wanted to avoid the resevoir method, as I was selecting a significant fraction of the list, which is small enough to hold in memory.
Does anyone have any suggestions on the best approach in this situation? I have my own solutions, but I'm hoping to find something more efficient, simpler, or both.

One of the fastest ways to make many with replacement samples from an unchanging list is the alias method. The core intuition is that we can create a set of equal-sized bins for the weighted list that can be indexed very efficiently through bit operations, to avoid a binary search. It will turn out that, done correctly, we will need to only store two items from the original list per bin, and thus can represent the split with a single percentage.
Let's us take the example of five equally weighted choices, (a:1, b:1, c:1, d:1, e:1)
To create the alias lookup:
Normalize the weights such that they sum to 1.0. (a:0.2 b:0.2 c:0.2 d:0.2 e:0.2) This is the probability of choosing each weight.
Find the smallest power of 2 greater than or equal to the number of variables, and create this number of partitions, |p|. Each partition represents a probability mass of 1/|p|. In this case, we create 8 partitions, each able to contain 0.125.
Take the variable with the least remaining weight, and place as much of it's mass as possible in an empty partition. In this example, we see that a fills the first partition. (p1{a|null,1.0},p2,p3,p4,p5,p6,p7,p8) with (a:0.075, b:0.2 c:0.2 d:0.2 e:0.2)
If the partition is not filled, take the variable with the most weight, and fill the partition with that variable.
Repeat steps 3 and 4, until none of the weight from the original partition need be assigned to the list.
For example, if we run another iteration of 3 and 4, we see
(p1{a|null,1.0},p2{a|b,0.6},p3,p4,p5,p6,p7,p8) with (a:0, b:0.15 c:0.2 d:0.2 e:0.2) left to be assigned
At runtime:
Get a U(0,1) random number, say binary 0.001100000
bitshift it lg2(p), finding the index partition. Thus, we shift it by 3, yielding 001.1, or position 1, and thus partition 2.
If the partition is split, use the decimal portion of the shifted random number to decide the split. In this case, the value is 0.5, and 0.5 < 0.6, so return a.
Here is some code and another explanation, but unfortunately it doesn't use the bitshifting technique, nor have I actually verified it.

A simple approach that hasn't been mentioned here is one proposed in Efraimidis and Spirakis. In python you could select m items from n >= m weighted items with strictly positive weights stored in weights, returning the selected indices, with:
import heapq
import math
import random
def WeightedSelectionWithoutReplacement(weights, m):
elt = [(math.log(random.random()) / weights[i], i) for i in range(len(weights))]
return [x[1] for x in heapq.nlargest(m, elt)]
This is very similar in structure to the first approach proposed by Nick Johnson. Unfortunately, that approach is biased in selecting the elements (see the comments on the method). Efraimidis and Spirakis proved that their approach is equivalent to random sampling without replacement in the linked paper.

Here's what I came up with for weighted selection without replacement:
def WeightedSelectionWithoutReplacement(l, n):
"""Selects without replacement n random elements from a list of (weight, item) tuples."""
l = sorted((random.random() * x[0], x[1]) for x in l)
return l[-n:]
This is O(m log m) on the number of items in the list to be selected from. I'm fairly certain this will weight items correctly, though I haven't verified it in any formal sense.
Here's what I came up with for weighted selection with replacement:
def WeightedSelectionWithReplacement(l, n):
"""Selects with replacement n random elements from a list of (weight, item) tuples."""
cuml = []
total_weight = 0.0
for weight, item in l:
total_weight += weight
cuml.append((total_weight, item))
return [cuml[bisect.bisect(cuml, random.random()*total_weight)] for x in range(n)]
This is O(m + n log m), where m is the number of items in the input list, and n is the number of items to be selected.

I'd recommend you start by looking at section 3.4.2 of Donald Knuth's Seminumerical Algorithms.
If your arrays are large, there are more efficient algorithms in chapter 3 of Principles of Random Variate Generation by John Dagpunar. If your arrays are not terribly large or you're not concerned with squeezing out as much efficiency as possible, the simpler algorithms in Knuth are probably fine.

It is possible to do Weighted Random Selection with replacement in O(1) time, after first creating an additional O(N)-sized data structure in O(N) time. The algorithm is based on the Alias Method developed by Walker and Vose, which is well described here.
The essential idea is that each bin in a histogram would be chosen with probability 1/N by a uniform RNG. So we will walk through it, and for any underpopulated bin which would would receive excess hits, assign the excess to an overpopulated bin. For each bin, we store the percentage of hits which belong to it, and the partner bin for the excess. This version tracks small and large bins in place, removing the need for an additional stack. It uses the index of the partner (stored in bucket[1]) as an indicator that they have already been processed.
Here is a minimal python implementation, based on the C implementation here
def prep(weights):
data_sz = len(weights)
factor = data_sz/float(sum(weights))
data = [[w*factor, i] for i,w in enumerate(weights)]
big=0
while big<data_sz and data[big][0]<=1.0: big+=1
for small,bucket in enumerate(data):
if bucket[1] is not small: continue
excess = 1.0 - bucket[0]
while excess > 0:
if big==data_sz: break
bucket[1] = big
bucket = data[big]
bucket[0] -= excess
excess = 1.0 - bucket[0]
if (excess >= 0):
big+=1
while big<data_sz and data[big][0]<=1: big+=1
return data
def sample(data):
r=random.random()*len(data)
idx = int(r)
return data[idx][1] if r-idx > data[idx][0] else idx
Example usage:
TRIALS=1000
weights = [20,1.5,9.8,10,15,10,15.5,10,8,.2];
samples = [0]*len(weights)
data = prep(weights)
for _ in range(int(sum(weights)*TRIALS)):
samples[sample(data)]+=1
result = [float(s)/TRIALS for s in samples]
err = [a-b for a,b in zip(result,weights)]
print(result)
print([round(e,5) for e in err])
print(sum([e*e for e in err]))

The following is a description of random weighted selection of an element of a
set (or multiset, if repeats are allowed), both with and without replacement in O(n) space
and O(log n) time.
It consists of implementing a binary search tree, sorted by the elements to be
selected, where each node of the tree contains:
the element itself (element)
the un-normalized weight of the element (elementweight), and
the sum of all the un-normalized weights of the left-child node and all of
its children (leftbranchweight).
the sum of all the un-normalized weights of the right-child node and all of
its chilren (rightbranchweight).
Then we randomly select an element from the BST by descending down the tree. A
rough description of the algorithm follows. The algorithm is given a node of
the tree. Then the values of leftbranchweight, rightbranchweight,
and elementweight of node is summed, and the weights are divided by this
sum, resulting in the values leftbranchprobability,
rightbranchprobability, and elementprobability, respectively. Then a
random number between 0 and 1 (randomnumber) is obtained.
if the number is less than elementprobability,
remove the element from the BST as normal, updating leftbranchweight
and rightbranchweight of all the necessary nodes, and return the
element.
else if the number is less than (elementprobability + leftbranchweight)
recurse on leftchild (run the algorithm using leftchild as node)
else
recurse on rightchild
When we finally find, using these weights, which element is to be returned, we either simply return it (with replacement) or we remove it and update relevant weights in the tree (without replacement).
DISCLAIMER: The algorithm is rough, and a treatise on the proper implementation
of a BST is not attempted here; rather, it is hoped that this answer will help
those who really need fast weighted selection without replacement (like I do).

This is an old question for which numpy now offers an easy solution so I thought I would mention it. Current version of numpy is version 1.2 and numpy.random.choice allows the sampling to be done with or without replacement and with given weights.

Suppose you want to sample 3 elements without replacement from the list ['white','blue','black','yellow','green'] with a prob. distribution [0.1, 0.2, 0.4, 0.1, 0.2]. Using numpy.random module it is as easy as this:
import numpy.random as rnd
sampling_size = 3
domain = ['white','blue','black','yellow','green']
probs = [.1, .2, .4, .1, .2]
sample = rnd.choice(domain, size=sampling_size, replace=False, p=probs)
# in short: rnd.choice(domain, sampling_size, False, probs)
print(sample)
# Possible output: ['white' 'black' 'blue']
Setting the replace flag to True, you have a sampling with replacement.
More info here:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html#numpy.random.choice

We faced a problem to randomly select K validators of N candidates once per epoch proportionally to their stakes. But this gives us the following problem:
Imagine probabilities of each candidate:
0.1
0.1
0.8
Probabilities of each candidate after 1'000'000 selections 2 of 3 without replacement became:
0.254315
0.256755
0.488930
You should know, those original probabilities are not achievable for 2 of 3 selection without replacement.
But we wish initial probabilities to be a profit distribution probabilities. Else it makes small candidate pools more profitable. So we realized that random selection with replacement would help us – to randomly select >K of N and store also weight of each validator for reward distribution:
std::vector<int> validators;
std::vector<int> weights(n);
int totalWeights = 0;
for (int j = 0; validators.size() < m; j++) {
int value = rand() % likehoodsSum;
for (int i = 0; i < n; i++) {
if (value < likehoods[i]) {
if (weights[i] == 0) {
validators.push_back(i);
}
weights[i]++;
totalWeights++;
break;
}
value -= likehoods[i];
}
}
It gives an almost original distribution of rewards on millions of samples:
0.101230
0.099113
0.799657

Distributing points over a surface within boundries

I'm interested in a way (algorithm) of distributing a predefined number of points over a 4 sided surface like a square.
The main issue is that each point has got to have a minimum and maximum proximity to each other (random between two predefined values). Basically the distance of any two points should not be closer than let's say 2, and a further than 3.
My code will be implemented in ruby (the points are locations, the surface is a map), but any ideas or snippets are definitely welcomed as all my ideas include a fair amount of brute force.

Try this paper. It has a nice, intuitive algorithm that does what you need.
In our modelization, we adopted another model: we consider each center to be related to all its neighbours by a repulsive string.
At the beginning of the simulation, the centers are randomly distributed, as well as the strengths of the
strings. We choose randomly to move one center; then we calculate the resulting force caused by all
neighbours of the given center, and we calculate the displacement which is proportional and oriented
in the sense of the resulting force.
After a certain number of iterations (which depends on the number of
centers and the degree of initial randomness) the system becomes stable.
In case it is not clear from the figures, this approach generates uniformly distributed points. You may use instead a force that is zero inside your bounds (between 2 and 3, for example) and non-zero otherwise (repulsive if the points are too close, attractive if too far).
This is my Python implementation (sorry, I don´t know ruby). Just import this and call uniform() to get a list of points.
import numpy as np
from numpy.linalg import norm
import pylab as pl
# find the nearest neighbors (brute force)
def neighbors(x, X, n=10):
dX = X - x
d = dX[:,0]**2 + dX[:,1]**2
idx = np.argsort(d)
return X[idx[1:11]]
# repulsion force, normalized to 1 when d == rmin
def repulsion(neib, x, d, rmin):
if d == 0:
return np.array([1,-1])
return 2*(x - neib)*rmin/(d*(d + rmin))
def attraction(neib, x, d, rmax):
return rmax*(neib - x)/(d**2)
def uniform(n=25, rmin=0.1, rmax=0.15):
# Generate randomly distributed points
X = np.random.random_sample( (n, 2) )
# Constants
# step is how much each point is allowed to move
# set to a lower value when you have more points
step = 1./50.
# maxk is the maximum number of iterations
# if step is too low, then maxk will need to increase
maxk = 100
k = 0
# Force applied to the points
F = np.zeros(X.shape)
# Repeat for maxk iterations or until all forces are zero
maxf = 1.
while maxf > 0 and k < maxk:
maxf = 0
for i in xrange(n):
# Force calculation for the i-th point
x = X[i]
f = np.zeros(x.shape)
# Interact with at most 10 neighbors
Neib = neighbors(x, X, 10)
# dmin is the distance to the nearest neighbor
dmin = norm(Neib[0] - x)
for neib in Neib:
d = norm(neib - x)
if d < rmin:
# feel repulsion from points that are too near
f += repulsion(neib, x, d, rmin)
elif dmin > rmax:
# feel attraction if there are no neighbors closer than rmax
f += attraction(neib, x, d, rmax)
# save all forces and the maximum force to normalize later
F[i] = f
if norm(f) <> 0:
maxf = max(maxf, norm(f))
# update all positions using the forces
if maxf > 0:
X += (F/maxf)*step
k += 1
if k == maxk:
print "warning: iteration limit reached"
return X

I presume that one of your brute force ideas includes just repeatedly generating points at random and checking to see if the constraints happen to be satisified.
Another way is to take a configuration that satisfies the constraints and repeatedly perturb a small part of it, chosen at random - for instance move a single point - to move to a randomly chosen nearby configuration. If you do this often enough you should move to a random configuration that is almost independent of the starting point. This could be justified under http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm or http://en.wikipedia.org/wiki/Gibbs_sampling.

I might try just doing it at random, then going through and dropping points that are to close to other points. You can compare the square of the distance to save some math time.
Or create cells with borders and place a point in each one. Less random, it depends on if this is a "just for looks thing" or not. But it could be very fast.

I made a compromise and ended up using the Poisson Disk Sampling method.
The result was fairly close to what I needed, especially with a lower number of tries (which also drastically reduces cost).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio