Minimum cost subset of sensors covering targets - algorithm

I have a question in dynamic programming, If I have a set of sensors covering targets ( a target might be covered by mutiple sensors) how can I find the minimum cost subset of sensors knowing that each sensors has its own cost?
I thought a lot about this, but I cant reach the recursive forumla to write my program? greedy algorithm gives me wrong minimum cost subset sometimes, and my problem is that sensors overlap in covering targets, any help?
For Example:
I have set of sensors with cost/weight = {s1:1,s2:2.5,s3:2} and I have three targets = {t1,t2,t3}. sensors coverage as following:={s1:t1 t2,s2:t1 t2 t3,s3:t2 t3} I need to get minimum cost subset by dynamic programming, for the above example if I use greedy algorithm I would get s1,s3 but the right answer is s2 only

check section 3 it labels the
Dynamic programming algorithm for the MWCDC
https://docs.google.com/viewer?a=v&q=cache:5vPrmVg7jDMJ:www.cs.iit.edu/~wan/Journal/tcs10.pdf+&hl=en&gl=us&pid=bl&srcid=ADGEESglfvp6XtFIkqDZZ-E-Tun4AWPTZV_V7z32pTvJ05K6tdkCoefpsAxPxdK44jYDvPNLDEwYI8uK-PMlLGthsaV8-ow63utalgWPnyLrUUBKhoTTVuYwUiKSHlCXU-HXKHVeHvh4&sig=AHIEtbQGka8F39MaT8yAy4G9Kvv8TPsvJA

I thought of something but I'm not 100% confident about it, here it goes:
S = {s1 : 1, s2 : 2.5, s3 : 2}
M = {s1 : t1t2, s2 : t1t2t3, s3 : t2t3}
Now, we build a matrix representing the target x sensor map:
[1, 1, 0]
[1, 1, 1]
[0, 1, 1]
So the rows are targets (t1 -> R0, t2 -> R1 etc.), and columns represent which sensors cover them.
Next we process row-by-row while collecting the list of sensors that will cover the current target set. For an example:
Row - 0:
{t1} -> [s1 : 1], [s2 : 2.5]
Notice that we're building a list of answers. Then we proceed to the next row, where we need to add t2 to our set of targets while calculating the minimum sensor weight required to do so.
Row - 1:
{t1, t2} -> [s1 : 1], [s2 : 2.5]
Note that nothing changed on the RHS because both s1 and s2 covers t2 as well. Next the final row:
Row - 2:
{t1, t2, t3} -> [s1, s3 : 3], [s2 : 2.5]
Notice that I had to add s3 to the first answer because it had the minimum weight covering t3.
A final walk through the list of answers would reveal that [s2 : 2.5] is the best candidate.
Now, I'm not that confident with dynamic programming, so not sure whether what I'm doing here is correct. Will be great if someone can confirm / dispute what I have done here.
EDIT: May be it makes sense to have the columns sorted according to the weight of the sensors. So that it becomes easy to select the sensor with the lowest weight covering a given target.

Here's my proposal, it's not a dynamic programming, but is the best I can come up with, the problem is interesting and worth a discussion.
Define "partial solution" to be tuple (T,S,P,C) where T is covered
targets, S is included sensors, P is the set of pending targets, C is
the cost.
W is the current working set of partial solutions, which
initially contains only ({}, {}, X, 0), i.e. cost zero, nonempty is only the set of pending targets. W can be maintained as a heap.
W = { ({}, {}, X, 0) }
repeat
p = select and remove from W the partial solution with the minimum cost
if p.P is empty
return p
t = select from p.P the target, covered by the minimum number of sensors
for each sensor s, covering t, which is not in p'.S
p' = new partial solution, copy of p;
p'.S += {s};
p'.C += cost(s);
for each target t' covered by s
p'.T += {t};
p'.P -= {t};
end for
W += {p'}
end for
end repeat

Here is my algorithm for this problem. Its a recursive approach for the problem.
Pseudocode :
MinimizeCost(int cost , List targetsReached, List sensorsUsed, int current_sensor) {
if(targetsReached.count == no_of_targets ) {
if(cost < mincost ) {
mincost = cost;
minList = sensorsUsed;
}
return;
}
if(current_sensor > maxsensors)
return;
else {
// Current Sensor is to be ignored
MinimizeCost(cost , targetsReached, sensorsUsed, current_sensor +1 );
// Current Sensor is Considered
int newcost = cost + sensor_cost[current_sensor];
sensorsUsed.Add(current_sensor);
AddIfNotExists(targetsReached, targets[current_sensor]);
MinimizeCost(newcost, targetsReached, sensorsUsed, current_sensor+1);
}
}
The Sensors_Used List can be avoided if those details are not needed.
Further Memoization can be introduced to this if the TargetsReached List can be mapped to an int. Then [Current_Sensor, TargetsReached] value can be saved and used when needed to avoid repetition. Hope this helps. There might be better approaches though.

Related

Algorithm for downsampling array of intervals

I have a sorted array of N intervals of different length. I am plotting these intervals with alternating colors blue/green.
I am trying to find a method or algorithm to "downsample" the array of intervals to produce a visually similar plot, but with less elements.
Ideally I could write some function where I can pass the target number of output intervals as an argument. The output length only has to come close to the target.
input = [
[0, 5, "blue"],
[5, 6, "green"],
[6, 10, "blue"],
// ...etc
]
output = downsample(input, 25)
// [[0, 10, "blue"], ... ]
Below is a picture of what I am trying to accomplish. In this example the input has about 250 intervals, and the output about ~25 intervals. The input length can vary a lot.
Update 1:
Below is my original post which I initially deleted, because there were issues with displaying the equations and also I wasn't very confident if it really makes sense. But later, I figured that the optimisation problem that I described can be actually solved efficiently with DP (Dynamic programming).
So I did a sample C++ implementation. Here are some results:
Here is a live demo that you can play with in your browser (make sure browser support WebGL2, like Chrome or Firefox). It takes a bit to load the page.
Here is the C++ implementation: link
Update 2:
Turns out the proposed solution has the following nice property - we can easily control the importance of the two parts F1 and F2 of the cost function. Simply change the cost function to F(α)=F1 + αF2, where α >= 1.0 is a free parameter. The DP algorithm remains the same.
Here are some result for different α values using the same number of intervals N:
Live demo (WebGL2 required)
As can be seen, higher α means it is more important to cover the original input intervals even if this means covering more of the background in-between.
Original post
Even-though some good algorithms have already been proposed, I would like to propose a slightly unusual approach - interpreting the task as an optimisation problem. Although, I don't know how to efficiently solve the optimisation problem (or even if it can be solved in reasonable time at all), it might be useful to someone purely as a concept.
First, without loss of generality, lets declare the blue color to be background. We will be painting N green intervals on top of it (N is the number provided to the downsample() function in OP's description). The ith interval is defined by its starting coordinate 0 <= xi < xmax and width wi >= 0 (xmax is the maximum coordinate from the input).
Lets also define the array G(x) to be the number of green cells in the interval [0, x) in the input data. This array can easily be pre-calculated. We will use it to quickly calculate the number of green cells in arbitrary interval [x, y) - namely: G(y) - G(x).
We can now introduce the first part of the cost function for our optimisation problem:
The smaller F1 is, the better our generated intervals cover the input intervals, so we will be searching for xi, wi that minimise it. Ideally we want F1=0 which would mean that the intervals do not cover any of the background (which of course is not possible because N is less than the input intervals).
However, this function is not enough to describe the problem, because obviously we can minimise it by taking empty intervals: F1(x, 0)=0. Instead, we want to cover as much as possible from the input intervals. Lets introduce the second part of the cost function which corresponds to this requirement:
The smaller F2 is, the more input intervals are covered. Ideally we want F2=0 which would mean that we covered all of the input rectangles. However, minimising F2 competes with minimising F1.
Finally, we can state our optimisation problem: find xi, wi that minimize F=F1 + F2
How to solve this problem? Not sure. Maybe use some metaheuristic approach for global optimisation such as Simulated annealing or Differential evolution. These are typically easy to implement, especially for this simple cost function.
Best case would be to exist some kind of DP algorithm for solving it efficiently, but unlikely.
I would advise you to use Haar wavelet. That is a very simple algorithm which was often used to provide the functionality of progressive loading for big images on websites.
Here you can see how it works with 2D function. That is what you can use. Alas, the document is in Ukrainian, but code in C++, so readable:)
This document provides an example of 3D object:
Pseudocode on how to compress with Haar wavelet you can find in Wavelets for Computer Graphics: A Primer Part 1y.
You could do the following:
Write out the points that divide the whole strip into intervals as the array [a[0], a[1], a[2], ..., a[n-1]]. In your example, the array would be [0, 5, 6, 10, ... ].
Calculate double-interval lengths a[2]-a[0], a[3]-a[1], a[4]-a[2], ..., a[n-1]-a[n-3] and find the least of them. Let it be a[k+2]-a[k]. If there are two or more equal lengths having the lowest value, choose one of them randomly. In your example, you should get the array [6, 5, ... ] and search for the minimum value through it.
Swap the intervals (a[k], a[k+1]) and (a[k+1], a[k+2]). Basically, you need to assign a[k+1]=a[k]+a[k+2]-a[k+1] to keep the lengths, and to remove the points a[k] and a[k+2] from the array after that because two pairs of intervals of the same color are now merged into two larger intervals. Thus, the numbers of blue and green intervals decreases by one each after this step.
If you're satisfied with the current number of intervals, end the process, otherwise go to the step 1.
You performed the step 2 in order to decrease "color shift" because, at the step 3, the left interval is moved a[k+2]-a[k+1] to the right and the right interval is moved a[k+1]-a[k] to the left. The sum of these distances, a[k+2]-a[k] can be considered a measure of change you're introducing into the whole picture.
Main advantages of this approach:
It is simple.
It doesn't give a preference to any of the two colors. You don't need to assign one of the colors to be the background and the other to be the painting color. The picture can be considered both as "green-on-blue" and "blue-on-green". This reflects quite common use case when two colors just describe two opposite states (like the bit 0/1, "yes/no" answer) of some process extended in time or in space.
It always keeps the balance between colors, i.e. the sum of intervals of each color remains the same during the reduction process. Thus the total brightness of the picture doesn't change. It is important as this total brightness can be considered an "indicator of completeness" at some cases.
Here's another attempt at dynamic programming that's slightly different than Georgi Gerganov's, although the idea to try and formulate a dynamic program may have been inspired by his answer. Neither the implementation nor the concept is guaranteed to be sound but I did include a code sketch with a visual example :)
The search space in this case is not reliant on the total unit width but rather on the number of intervals. It's O(N * n^2) time and O(N * n) space, where N and n are the target and given number of (green) intervals, respectively, because we assume that any newly chosen green interval must be bound by two green intervals (rather than extend arbitrarily into the background).
The idea also utilises the prefix sum idea used to calculate runs with a majority element. We add 1 when we see the target element (in this case green) and subtract 1 for others (that algorithm is also amenable to multiple elements with parallel prefix sum tracking). (I'm not sure that restricting candidate intervals to sections with a majority of the target colour is always warranted but it may be a useful heuristic depending on the desired outcome. It's also adjustable -- we can easily adjust it to check for a different part than 1/2.)
Where Georgi Gerganov's program seeks to minimise, this dynamic program seeks to maximise two ratios. Let h(i, k) represent the best sequence of green intervals up to the ith given interval, utilising k intervals, where each is allowed to stretch back to the left edge of some previous green interval. We speculate that
h(i, k) = max(r + C*r1 + h(i-l, k-1))
where, in the current candidate interval, r is the ratio of green to the length of the stretch, and r1 is the ratio of green to the total given green. r1 is multiplied by an adjustable constant to give more weight to the volume of green covered. l is the length of the stretch.
JavaScript code (for debugging, it includes some extra variables and log lines):
function rnd(n, d=2){
let m = Math.pow(10,d)
return Math.round(m*n) / m;
}
function f(A, N, C){
let ps = [[0,0]];
let psBG = [0];
let totalG = 0;
A.unshift([0,0]);
for (let i=1; i<A.length; i++){
let [l,r,c] = A[i];
if (c == 'g'){
totalG += r - l;
let prevI = ps[ps.length-1][1];
let d = l - A[prevI][1];
let prevS = ps[ps.length-1][0];
ps.push(
[prevS - d, i, 'l'],
[prevS - d + r - l, i, 'r']
);
psBG[i] = psBG[i-1];
} else {
psBG[i] = psBG[i-1] + r - l;
}
}
//console.log(JSON.stringify(A));
//console.log('');
//console.log(JSON.stringify(ps));
//console.log('');
//console.log(JSON.stringify(psBG));
let m = new Array(N + 1);
m[0] = new Array((ps.length >> 1) + 1);
for (let i=0; i<m[0].length; i++)
m[0][i] = [0,0];
// for each in N
for (let i=1; i<=N; i++){
m[i] = new Array((ps.length >> 1) + 1);
for (let ii=0; ii<m[0].length; ii++)
m[i][ii] = [0,0];
// for each interval
for (let j=i; j<m[0].length; j++){
m[i][j] = m[i][j-1];
for (let k=j; k>i-1; k--){
// our anchors are the right
// side of each interval, k's are the left
let jj = 2*j;
let kk = 2*k - 1;
// positive means green
// is a majority
if (ps[jj][0] - ps[kk][0] > 0){
let bg = psBG[ps[jj][1]] - psBG[ps[kk][1]];
let s = A[ps[jj][1]][1] - A[ps[kk][1]][0] - bg;
let r = s / (bg + s);
let r1 = C * s / totalG;
let candidate = r + r1 + m[i-1][j-1][0];
if (candidate > m[i][j][0]){
m[i][j] = [
candidate,
ps[kk][1] + ',' + ps[jj][1],
bg, s, r, r1,k,m[i-1][j-1][0]
];
}
}
}
}
}
/*
for (row of m)
console.log(JSON.stringify(
row.map(l => l.map(x => typeof x != 'number' ? x : rnd(x)))));
*/
let result = new Array(N);
let j = m[0].length - 1;
for (let i=N; i>0; i--){
let [_,idxs,w,x,y,z,k] = m[i][j];
let [l,r] = idxs.split(',');
result[i-1] = [A[l][0], A[r][1], 'g'];
j = k - 1;
}
return result;
}
function show(A, last){
if (last[1] != A[A.length-1])
A.push(last);
let s = '';
let j;
for (let i=A.length-1; i>=0; i--){
let [l, r, c] = A[i];
let cc = c == 'g' ? 'X' : '.';
for (let j=r-1; j>=l; j--)
s = cc + s;
if (i > 0)
for (let j=l-1; j>=A[i-1][1]; j--)
s = '.' + s
}
for (let j=A[0][0]-1; j>=0; j--)
s = '.' + s
console.log(s);
return s;
}
function g(A, N, C){
const ts = f(A, N, C);
//console.log(JSON.stringify(ts));
show(A, A[A.length-1]);
show(ts, A[A.length-1]);
}
var a = [
[0,5,'b'],
[5,9,'g'],
[9,10,'b'],
[10,15,'g'],
[15,40,'b'],
[40,41,'g'],
[41,43,'b'],
[43,44,'g'],
[44,45,'b'],
[45,46,'g'],
[46,55,'b'],
[55,65,'g'],
[65,100,'b']
];
// (input, N, C)
g(a, 2, 2);
console.log('');
g(a, 3, 2);
console.log('');
g(a, 4, 2);
console.log('');
g(a, 4, 5);
I would suggest using K-means it is an algorithm used to group data(a more detailed explanation here: https://en.wikipedia.org/wiki/K-means_clustering and here https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)
this would be a brief explanation of how the function should look like, hope it is helpful.
from sklearn.cluster import KMeans
import numpy as np
def downsample(input, cluster = 25):
# you will need to group your labels in a nmpy array as shown bellow
# for the sake of example I will take just a random array
X = np.array([[1, 2], [1, 4], [1, 0],[4, 2], [4, 4], [4, 0]])
# n_clusters will be the same as desired output
kmeans = KMeans(n_clusters= cluster, random_state=0).fit(X)
# then you can iterate through labels that was assigned to every entr of your input
# in our case the interval
kmeans_list = [None]*cluster
for i in range(0, X.shape[0]):
kmeans_list[kmeans.labels_[i]].append(X[i])
# after that you will basicly have a list of lists and every inner list will contain all points that corespond to a
# specific label
ret = [] #return list
for label_list in kmeans_list:
left = 10001000 # a big enough number to exced anything that you will get as an input
right = -left # same here
for entry in label_list:
left = min(left, entry[0])
right = max(right, entry[1])
ret.append([left,right])
return ret

Proving that there are no overlapping sub-problems?

I just got the following interview question:
Given a list of float numbers, insert “+”, “-”, “*” or “/” between each consecutive pair of numbers to find the maximum value you can get. For simplicity, assume that all operators are of equal precedence order and evaluation happens from left to right.
Example:
(1, 12, 3) -> 1 + 12 * 3 = 39
If we built a recursive solution, we would find that we would get an O(4^N) solution. I tried to find overlapping sub-problems (to increase the efficiency of this algorithm) and wasn't able to find any overlapping problems. The interviewer then told me that there wasn't any overlapping subsolutions.
How can we detect when there are overlapping solutions and when there isn't? I spent a lot of time trying to "force" subsolutions to appear and eventually the Interviewer told me that there wasn't any.
My current solution looks as follows:
def maximumNumber(array, current_value=None):
if current_value is None:
current_value = array[0]
array = array[1:]
if len(array) == 0:
return current_value
return max(
maximumNumber(array[1:], current_value * array[0]),
maximumNumber(array[1:], current_value - array[0]),
maximumNumber(array[1:], current_value / array[0]),
maximumNumber(array[1:], current_value + array[0])
)
Looking for "overlapping subproblems" sounds like you're trying to do bottom up dynamic programming. Don't bother with that in an interview. Write the obvious recursive solution. Then memoize. That's the top down approach. It is a lot easier to get working.
You may get challenged on that. Here was my response the last time that I was asked about that.
There are two approaches to dynamic programming, top down and bottom up. The bottom up approach usually uses less memory but is harder to write. Therefore I do the top down recursive/memoize and only go for the bottom up approach if I need the last ounce of performance.
It is a perfectly true answer, and I got hired.
Now you may notice that tutorials about dynamic programming spend more time on bottom up. They often even skip the top down approach. They do that because bottom up is harder. You have to think differently. It does provide more efficient algorithms because you can throw away parts of that data structure that you know you won't use again.
Coming up with a working solution in an interview is hard enough already. Don't make it harder on yourself than you need to.
EDIT Here is the DP solution that the interviewer thought didn't exist.
def find_best (floats):
current_answers = {floats[0]: ()}
floats = floats[1:]
for f in floats:
next_answers = {}
for v, path in current_answers.iteritems():
next_answers[v + f] = (path, '+')
next_answers[v * f] = (path, '*')
next_answers[v - f] = (path, '-')
if 0 != f:
next_answers[v / f] = (path, '/')
current_answers = next_answers
best_val = max(current_answers.keys())
return (best_val, current_answers[best_val])
Generally the overlapping sub problem approach is something where the problem is broken down into smaller sub problems, the solutions to which when combined solve the big problem. When these sub problems exhibit an optimal sub structure DP is a good way to solve it.
The decision about what you do with a new number that you encounter has little do with the numbers you have already processed. Other than accounting for signs of course.
So I would say this is a over lapping sub problem solution but not a dynamic programming problem. You could use dive and conquer or evenmore straightforward recursive methods.
Initially let's forget about negative floats.
process each new float according to the following rules
If the new float is less than 1, insert a / before it
If the new float is more than 1 insert a * before it
If it is 1 then insert a +.
If you see a zero just don't divide or multiply
This would solve it for all positive floats.
Now let's handle the case of negative numbers thrown into the mix.
Scan the input once to figure out how many negative numbers you have.
Isolate all the negative numbers in a list, convert all the numbers whose absolute value is less than 1 to the multiplicative inverse. Then sort them by magnitude. If you have an even number of elements we are all good. If you have an odd number of elements store the head of this list in a special var , say k, and associate a processed flag with it and set the flag to False.
Proceed as before with some updated rules
If you see a negative number less than 0 but more than -1, insert a / divide before it
If you see a negative number less than -1, insert a * before it
If you see the special var and the processed flag is False, insert a - before it. Set processed to True.
There is one more optimization you can perform which is removing paris of negative ones as candidates for blanket subtraction from our initial negative numbers list, but this is just an edge case and I'm pretty sure you interviewer won't care
Now the sum is only a function of the number you are adding and not the sum you are adding to :)
Computing max/min results for each operation from previous step. Not sure about overall correctness.
Time complexity O(n), space complexity O(n)
const max_value = (nums) => {
const ops = [(a, b) => a+b, (a, b) => a-b, (a, b) => a*b, (a, b) => a/b]
const dp = Array.from({length: nums.length}, _ => [])
dp[0] = Array.from({length: ops.length}, _ => [nums[0],nums[0]])
for (let i = 1; i < nums.length; i++) {
for (let j = 0; j < ops.length; j++) {
let mx = -Infinity
let mn = Infinity
for (let k = 0; k < ops.length; k++) {
if (nums[i] === 0 && k === 3) {
// If current number is zero, removing division
ops.splice(3, 1)
dp.splice(3, 1)
continue
}
const opMax = ops[j](dp[i-1][k][0], nums[i])
const opMin = ops[j](dp[i-1][k][1], nums[i])
mx = Math.max(opMax, opMin, mx)
mn = Math.min(opMax, opMin, mn)
}
dp[i].push([mx,mn])
}
}
return Math.max(...dp[nums.length-1].map(v => Math.max(...v)))
}
// Tests
console.log(max_value([1, 12, 3]))
console.log(max_value([1, 0, 3]))
console.log(max_value([17,-34,2,-1,3,-4,5,6,7,1,2,3,-5,-7]))
console.log(max_value([59, 60, -0.000001]))
console.log(max_value([0, 1, -0.0001, -1.00000001]))

Number of partitions with a given constraint

Consider a set of 13 Danish, 11 Japanese and 8 Polish people. It is well known that the number of different ways of dividing this set of people to groups is the 13+11+8=32:th Bell number (the number of set partitions). However we are asked to find the number of possible set partitions under a given constraint. The question is as follows:
A set partition is said to be good if it has no group consisting of at least two people that only includes a single nationality. How many good partitions there are for this set? (A group may include only one person.)
The brute force approach requires going though about 10^26 partitions and checking which ones are good. This seems pretty unfeasible, especially if the groups are larger or one introduces other nationalities. Is there a smart way instead?
EDIT: As a side note. There probably is no hope for a really nice solution. A highly esteemed expert in combinatorics answered a related question, which, I think, basically says that the related problem, and thus this problem also, is very difficult to solve exactly.
Here's a solution using dynamic programming.
It starts from an empty set, then adds one element at a time and calculates all the valid partitions.
The state space is huge, but notice that to be able to calculate the next step we only need to know about a partition the following things:
For each nationality, how many sets it contains that consists of only a single member of that nationality. (e.g.: {a})
How many sets it contains with mixed elements. (e.g.: {a, b, c})
For each of these configurations I only store the total count. Example:
[0, 1, 2, 2] -> 3
{a}{b}{c}{mixed}
e.g.: 3 partitions that look like: {b}, {c}, {c}, {a,c}, {b,c}
Here's the code in python:
import collections
from operator import mul
from fractions import Fraction
def nCk(n,k):
return int( reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1) )
def good_partitions(l):
n = len(l)
i = 0
prev = collections.defaultdict(int)
while l:
#any more from this kind?
if l[0] == 0:
l.pop(0)
i += 1
continue
l[0] -= 1
curr = collections.defaultdict(int)
for solution,total in prev.iteritems():
for idx,item in enumerate(solution):
my_solution = list(solution)
if idx == i:
# add element as a new set
my_solution[i] += 1
curr[tuple(my_solution)] += total
elif my_solution[idx]:
if idx != n:
# add to a set consisting of one element
# or merge into multiple sets that consist of one element
cnt = my_solution[idx]
c = cnt
while c > 0:
my_solution = list(solution)
my_solution[n] += 1
my_solution[idx] -= c
curr[tuple(my_solution)] += total * nCk(cnt, c)
c -= 1
else:
# add to a mixed set
cnt = my_solution[idx]
curr[tuple(my_solution)] += total * cnt
if not prev:
# one set with one element
lone = [0] * (n+1)
lone[i] = 1
curr[tuple(lone)] = 1
prev = curr
return sum(prev.values())
print good_partitions([1, 1, 1, 1]) # 15
print good_partitions([1, 1, 1, 1, 1]) # 52
print good_partitions([2, 1]) # 4
print good_partitions([13, 11, 8]) # 29811734589499214658370837
It produces correct values for the test cases. I also tested it against a brute-force solution (for small values), and it produces the same results.
An exact analytic solution is hard, but a polynomial time+space dynamic programming solution is straightforward.
First of all, we need an absolute order on the size of groups. We do that by comparing how many Danes, Japanese, and Poles we have.
Next, the function to write is this one.
m is the maximum group size we can emit
p is the number of people of each nationality that we have left to split
max_good_partitions_of_maximum_size(m, p) is the number of "good partitions"
we can form from p people, with no group being larger than m
Clearly you can write this as a somewhat complicated recursive function that always select the next partition to use, then call itself with that as the new maximum size, and subtract the partition from p. If you had this function, then your answer is simply max_good_partitions_of_maximum_size(p, p) with p = [13, 11, 8]. But that is going to be a brute force search that won't run in reasonable time.
Finally apply https://en.wikipedia.org/wiki/Memoization by caching every call to this function, and it will run in polynomial time. However you will also have to cache a polynomial number of calls to it.

Calculating slot machine payout

A slot machine has 5 reels and displays 3 symbols per reel (no spaces or "empty" symbols).
Payout can occur in a number of ways. Some examples...
A special diamond symbol appears
3 Lucky 7's appear
All five symbols in the payline are identical
All five symbols are the same number, but different color
Etc.
There are also multiple paylines that need to be checked for a payout.
What is the most efficient way to calculate winnings for each spin? Or, is there a more efficient way than brute force to apply each payout scenario to each payline?
Every payout besides the paylines seem trivial. For the three lucky 7s, just iterate over the visible squares and count the 7s. Same for checking for a diamond. If we let h be the number of rows and w be the number of columns, this operation is O(hw*), which for practically sized slot machines is pretty low.
The paylines, though, are more interesting. Theoretically the number of paylines (m from here on out) is much, much larger than *h ** w*; before throwing out illegal paylines that jump m = h^w which is much larger than *h ** w. More importantly, they appear to share a lot of similarity. For example, line 2 and line 6 in your example both require matching top left and top middle-left squares. If these two don't match, then you can't win on either line 2 or line 6.
To represent paylines, I'm going to use length w arrays of integers in the range [1, h], such that payline[i] = the index in the column (1 indexed) of row i in the solution. For example, payline 1 is [1, 1, 1, 1, 1], and payline 17 is [3, 3, 2, 1, 2].
To this end, a suffix tree seems like an applicable data structure that can vastly improve your running time of checking all of the paylines against a given board state. Consider the following algorithm to construct a suffix tree that encapsulates all paylines.
Initialize:
Create a root node at column 0 (off-screen, non-column part of all solutions)
root node.length = 0
root node.terminal = false
Add all paylines (in the form of length w arrays of integers ranging from 1 to h) to the root nodes' "toDistribute set"
Create a toWork queue, add the root node to it
Iterate: while toWork not empty:
let node n = toWork.pop()
if n.length < w
create children of n with length n.length + 1 and terminal = (n.length + 1 == w).
for payline p in n.toDistribute
remove p from n.toDistribute
if(p.length > 1)
add p.subArray(1, end) to child of n as applicable.
add children of n to toWork
Running this construction algorithm on your example for lines 1-11 gives a tree that looks like this:
The computation of this tree is fairly intensive; it involves the creation of sum i = 1 to w of h ^ i nodes. The size of the tree depends only on the size of the board (height and width), not the number of paylines, which is the main advantage of this approach. Another advantage is that it's all pre-processing; you can have this tree built long before a player ever sits down to pull the lever.
Once the tree is built, you can give each node a field for each match criteria (same symbol, same color, etc). Then, when evaluating a board state, you can dfs the tree, and at every new node, ask (for each critera) if it matches its parent node. If so, mark that criteria as true and continue. Else, mark it as false and do not search the children for that criteria. For example, if you're looking specifically for identical tokens on the sub array [1, 1, ...] and find that column 1's row 1 and column 2's row 1 don't match, then any payline that includes [1, 1, ...] (2, 6, 16, 20) all can't be won, and you don't have to dfs that part of the tree.
It's hard to have a thorough algorithmic analysis of how much more efficient this dfs approach is than individually checking each payline, because such an analysis would require knowing how much left-side overlap (on average) there is between paylines. It's certainly no worse, and at least for your example is a good deal better. Moreover, the more paylines you add to a board, the greater the overlap, and the greater the time savings for checking all paylines by using this method.
In order to calculate RTP you should have full slot machine information. The most important part are reels strips. Monte-Carlo is usually done in order to get statistics needed. For example: https://raw.githubusercontent.com/VelbazhdSoftwareLLC/BugABoomSimulator/master/Main.cs
Paytable info:
private static int[][] paytable = {
new int[]{0,0,0,0,0,0,0,0,0,0,0,0,0},
new int[]{0,0,0,0,0,0,0,0,0,0,0,0,0},
new int[]{0,0,0,0,0,0,0,0,2,2,2,10,2},
new int[]{5,5,5,10,10,10,15,15,25,25,50,250,5},
new int[]{25,25,25,50,50,50,75,75,125,125,250,2500,0},
new int[]{125,125,125,250,250,250,500,500,750,750,1250,10000,0},
};
Betting lines:
private static int[][] lines = {
new int[]{1,1,1,1,1},
new int[]{0,0,0,0,0},
new int[]{2,2,2,2,2},
new int[]{0,1,2,1,0},
new int[]{2,1,0,1,2},
new int[]{0,0,1,2,2},
new int[]{2,2,1,0,0},
new int[]{1,0,1,2,1},
new int[]{1,2,1,0,1},
new int[]{1,0,0,1,0},
new int[]{1,2,2,1,2},
new int[]{0,1,0,0,1},
new int[]{2,1,2,2,1},
new int[]{0,2,0,2,0},
new int[]{2,0,2,0,2},
new int[]{1,0,2,0,1},
new int[]{1,2,0,2,1},
new int[]{0,1,1,1,0},
new int[]{2,1,1,1,2},
new int[]{0,2,2,2,0},
};
Reels strips:
private static int[][] baseReels = {
new int[]{0,4,11,1,3,2,5,9,0,4,2,7,8,0,5,2,6,10,0,5,1,3,9,4,2,7,8,0,5,2,6,9,0,5,2,4,10,0,5,1,7,9,2,5},
new int[]{4,1,11,2,7,0,9,5,1,3,8,4,2,6,12,4,0,3,1,8,4,2,6,0,10,4,1,3,2,12,4,0,7,1,8,2,4,0,9,1,6,2,8,0},
new int[]{1,7,11,5,1,7,8,6,0,3,12,4,1,6,9,5,2,7,10,1,3,2,8,1,3,0,9,5,1,3,10,6,0,3,8,7,1,6,12,3,2,5,9,3},
new int[]{5,2,11,3,0,6,1,5,12,2,4,0,10,3,1,7,3,2,11,5,4,6,0,5,12,1,3,7,2,4,8,0,3,6,1,4,12,2,5,7,0,4,9,1},
new int[]{7,0,11,4,6,1,9,5,10,2,7,3,8,0,4,9,1,6,5,10,2,8,3},
};
private static int[][] freeReels = {
new int[]{2,4,11,0,3,7,1,4,8,2,5,6,0,5,9,1,3,7,2,4,10,0,3,1,8,4,2,5,6,0,4,1,10,5,2,3,7,0,5,9,1,3,6},
new int[]{4,2,11,0,5,2,12,1,7,0,9,2,3,0,12,2,4,0,5,8,2,6,0,12,2,7,1,3,10,6,0},
new int[]{1,4,11,2,7,8,1,5,12,0,3,9,1,7,8,1,5,12,2,6,10,1,4,9,3,1,8,0,12,6,9},
new int[]{6,4,11,2,7,3,9,1,6,5,12,0,4,10,2,3,8,1,7,5,12,0},
new int[]{3,4,11,0,6,5,3,8,1,7,4,9,2,5,10,0,3,8,1,4,10,2,5,9},
};
The spin function, which should be called many times in order to calculate RTP:
private static void spin(int[][] reels) {
for (int i = 0, r, u, d; i < view.Length && i < reels.Length; i++) {
if (bruteForce == true) {
u = reelsStops [i];
r = u + 1;
d = u + 2;
} else {
u = prng.Next (reels [i].Length);
r = u + 1;
d = u + 2;
}
r = r % reels[i].Length;
d = d % reels[i].Length;
view[i][0] = reels[i][u];
view[i][1] = reels[i][r];
view[i][2] = reels[i][d];
}
}
After each spin all wins should be calculated.

For every vertex in a graph, find all vertices within a distance d

In my particular case, the graph is represented as an adjacency list and is undirected and sparse, n can be in the millions, and d is 3. Calculating A^d (where A is the adjacency matrix) and picking out the non-zero entries works, but I'd like something that doesn't involve matrix multiplication. A breadth-first search on every vertex is also an option, but it is slow.
def find_d(graph, start, st, d=0):
if d == 0:
st.add(start)
else:
st.add(start)
for edge in graph[start]:
find_d(graph, edge, st, d-1)
return st
graph = { 1 : [2, 3],
2 : [1, 4, 5, 6],
3 : [1, 4],
4 : [2, 3, 5],
5 : [2, 4, 6],
6 : [2, 5]
}
print find_d(graph, 1, set(), 2)
Let's say that we have a function verticesWithin(d,x) that finds all vertices within distance d of vertex x.
One good strategy for a problem such as this, to expose caching/memoisation opportunities, is to ask the question: How are the subproblems of this problem related to each other?
In this case, we can see that verticesWithin(d,x) if d >= 1 is the union of vertices(d-1,y[i]) for all i within range, where y=verticesWithin(1,x). If d == 0 then it's simply {x}. (I'm assuming that a vertex is deemed to be of distance 0 from itself.)
In practice you'll want to look at the adjacency list for the case d == 1, rather than using that relation, to avoid an infinite loop. You'll also want to avoid the redundancy of considering x itself as a member of y.
Also, if the return type of verticesWithin(d,x) is changed from a simple list or set, to a list of d sets representing increasing distance from x, then
verticesWithin(d,x) = init(verticesWithin(d+1,x))
where init is the function that yields all elements of a list except the last one. Obviously this would be a non-terminating recursive relation if transcribed literally into code, so you have to be a little bit clever about how you implement it.
Equipped with these relations between the subproblems, we can now cache the results of verticesWithin, and use these cached results to avoid performing redundant traversals (albeit at the cost of performing some set operations - I'm not entirely sure that this is a win). I'll leave it as an exercise to fill in the implementation details.
You already mention the option of calculating A^d, but this is much, much more than you need (as you already remark).
There is, however, a much cheaper way of using this idea. Suppose you have a (column) vector v of zeros and ones, representing a set of vertices. The vector w := A v now has a one at every node that can be reached from the starting node in exactly one step. Iterating, u := A w has a one for every node you can reach from the starting node in exactly two steps, etc.
For d=3, you could do the following (MATLAB pseudo-code):
v = j'th unit vector
w = v
for i = (1:d)
v = A*v
w = w + v
end
the vector w now has a positive entry for each node that can be accessed from the jth node in at most d steps.
Breadth first search starting with the given vertex is an optimal solution in this case. You will find all the vertices that within the distance d, and you will never even visit any vertices with distance >= d + 2.
Here is recursive code, although recursion can be easily done away with if so desired by using a queue.
// Returns a Set
Set<Node> getNodesWithinDist(Node x, int d)
{
Set<Node> s = new HashSet<Node>(); // our return value
if (d == 0) {
s.add(x);
} else {
for (Node y: adjList(x)) {
s.addAll(getNodesWithinDist(y,d-1);
}
}
return s;
}

Resources