Testing for Adjacent Cells In a Multi-level Grid - algorithm

I'm designing an algorithm to test whether cells on a grid are adjacent or not.
The catch is that the cells are not on a flat grid. They are on a multi-level grid such as the one drawn below.
Level 1 (Top Level)
| - - - - - |
| A | B | C |
| - - - - - |
| D | E | F |
| - - - - - |
| G | H | I |
| - - - - - |
Level 2
| -Block A- | -Block B- |
| 1 | 2 | 3 | 1 | 2 | 3 |
| - - - - - | - - - - - |
| 4 | 5 | 6 | 4 | 5 | 6 | ...
| - - - - - | - - - - - |
| 7 | 8 | 9 | 7 | 8 | 9 |
| - - - - - | - - - - - |
| -Block D- | -Block E- |
| 1 | 2 | 3 | 1 | 2 | 3 |
| - - - - - | - - - - - |
| 4 | 5 | 6 | 4 | 5 | 6 | ...
| - - - - - | - - - - - |
| 7 | 8 | 9 | 7 | 8 | 9 |
| - - - - - | - - - - - |
. .
. .
. .
This diagram is simplified from my actual need but the concept is the same. There is a top level block with many cells within it (level 1). Each block is further subdivided into many more cells (level 2). Those cells are further subdivided into level 3, 4 and 5 for my project but let's just stick to two levels for this question.
I'm receiving inputs for my function in the form of "A8, A9, B7, D3". That's a list of cell Ids where each cell Id has the format (level 1 id)(level 2 id).
Let's start by comparing just 2 cells, A8 and A9. That's easy because they are in the same block.
private static RelativePosition getRelativePositionInTheSameBlock(String v1, String v2) {
RelativePosition relativePosition;
if( v1-v2 == -1 ) {
relativePosition = RelativePosition.LEFT_OF;
}
else if (v1-v2 == 1) {
relativePosition = RelativePosition.RIGHT_OF;
}
else if (v1-v2 == -BLOCK_WIDTH) {
relativePosition = RelativePosition.TOP_OF;
}
else if (v1-v2 == BLOCK_WIDTH) {
relativePosition = RelativePosition.BOTTOM_OF;
}
else {
relativePosition = RelativePosition.NOT_ADJACENT;
}
return relativePosition;
}
An A9 - B7 comparison could be done by checking if A is a multiple of BLOCK_WIDTH and whether B is (A-BLOCK_WIDTH+1).
Either that or just check naively if the A/B pair is 3-1, 6-4 or 9-7 for better readability.
For B7 - D3, they are not adjacent but D3 is adjacent to A9 so I can do a similar adjacency test as above.
So getting away from the little details and focusing on the big picture. Is this really the best way to do it? Keeping in mind the following points:
I actually have 5 levels not 2, so I could potentially get a list like "A8A1A, A8A1B, B1A2A, B1A2B".
Adding a new cell to compare still requires me to compare all the other cells before it (seems like the best I could do for this step
is O(n))
The cells aren't all 3x3 blocks, they're just that way for my example. They could be MxN blocks with different M and N for
different levels.
In my current implementation above, I have separate functions to check adjacency if the cells are in the same blocks, if they are in
separate horizontally adjacent blocks or if they are in separate
vertically adjacent blocks. That means I have to know the position of
the two blocks at the current level before I call one of those
functions for the layer below.
Judging by the complexity of having to deal with mulitple functions for different edge cases at different levels and having 5 levels of nested if statements. I'm wondering if another design is more suitable. Perhaps a more recursive solution, use of other data structures, or perhaps map the entire multi-level grid to a single-level grid (my quick calculations gives me about 700,000+ atomic cell ids). Even if I go that route, mapping from multi-level to single level is a non-trivial task in itself.

I actually have 5 levels not 2, so I could potentially get a list like "A8A1A, A8A1B, B1A2A, B1A2B".
Adding a new cell to compare still requires me to compare all the other cells before it (seems like the best I could do for this step is
O(n))
The cells aren't all 3x3 blocks, they're just that way for my example. They could be MxN blocks with different M and N for different
levels.
I don't see a problem with these points: if a cell is not adjacent at the highest level one then we can stop the computation right there and we don't have to compute adjacency at the lower levels. If there are only five levels then you'll do at most five adjacency computations which should be fine.
In my current implementation above, I have separate functions to check adjacency if the cells are in the same blocks, if they are in separate horizontally adjacent blocks or if they are in separate vertically adjacent blocks. That means I have to know the position of the two blocks at the current level before I call one of those functions for the layer below.
You should try to rewrite this. There should only be two methods: one that computes whether two cells are adjacent and one that computes whether two cells are adjacent at a given level:
RelativePosition isAdjacent(String cell1, String cell2);
RelativePosition isAdjacentAtLevel(String cell1, String cell2, int level);
The method isAdjacent calls the method isAdjacentAtLevel for each of the levels. I'm not sure whether cell1 or cell2 always contain information of all the levels but isAdjacent could analyze the given cell strings and call the appropriate level adjacency checks accordingly. When two cells are not adjacent at a particular level then all deeper levels don't need to be checked.
The method isAdjacentAtLevel should do: lookup M and N for the given level, extract the information from cell1 and cell2 of the given level and perform the adjacency computation. The computation should be the same for each level as each level, on its own, has the same block structure.

Calculate and compare the absolute x and y coordinate for the lowest level.
For the example (assuming int index0 = 0 for A, 1 for B, ... and index1 = 0...8):
int x = (index0 % 3) * 3 + index1 % 3;
int y = (index0 / 3) * 3 + index1 / 3;
In general, given
int[] WIDTHS; // cell width at level i
int[] HEIGHTS; // cell height at level i
// indices: cell index at each level, normalized to 0..WIDTH[i]*HEIGHT[i]-1
int getX (int[] indices) {
int x = 0;
for (int i = 0; i < indices.length; i++) {
x = x * WIDTHS[i] + indices[i] % WIDTHS[i];
}
return x;
}
int getY (int[] indices) {
int y = 0;
for (int i = 0; i < indices.length; i++) {
y = y * HEIGHTS[i] + indices[i] / WIDTHS[i];
}
return x;
}

You can use a space filling curve, for example a peano curve or z morton curve.

Related

Logic behind including / excluding current element in recursive approach

I'm studying DP nowadays however I've run into previously some examples like subset sum or as shown in this question coin change problem that their solutions call recursive cases both including the current element and excluding the current element. Yet, I've genuinely difficulty in comprehending what/why it's real reason by doing this approach. I cannot get the underneath logic behind of it. I don't want to memorize or to say "humm, okay, keep in mind it, there is an approach" like that styles.
class Util
{
// Function to find the total number of distinct ways to get
// change of N from unlimited supply of coins in set S
public static int count(int[] S, int n, int N)
{
// if total is 0, return 1 (solution found)
if (N == 0) {
return 1;
}
// return 0 (solution do not exist) if total become negative or
// no elements are left
if (N < 0 || n < 0) {
return 0;
}
// Case 1. include current coin S[n] in solution and recurse
// with remaining change (N - S[n]) with same number of coins
int incl = count(S, n, N - S[n]);
// Case 2. exclude current coin S[n] from solution and recurse
// for remaining coins (n - 1)
int excl = count(S, n - 1, N);
// return total ways by including or excluding current coin
return incl + excl;
}
// Coin Change Problem
public static void main(String[] args)
{
// n coins of given denominations
int[] S = { 1, 2, 3 };
// Total Change required
int N = 4;
System.out.print("Total number of ways to get desired change is "
+ count(S, S.length - 1, N));
}
}
I don't want to skip the parts superficially since recurrence formulas are really play leading role for dynamic programming.
At each recursion you want to explore both cases:
one more coin of type n is used
you are done with coin type n and proceed to the next coin type
The remaining task is handled in both cases by a recursive call.
By the way, this solution has nothing to do with dynamic programming.
In the common powerset problem, given (1 2 3) we are asked to generate ((1 2 3) (1 2) (1 3) (1) (2 3) (2) (3) ()). We can use this with and without technique to generate the result.
+---+ +---------------------------+ +--------------------------------------------+
| +-with----> ((1 2 3) (1 2) (1 3) (1)) | | |
| 1 | | +-----> ((1 2 3) (1 2) (1 3) (1) (2 3) (2) (3) ()) |
| +-without-> ((2 3) (2) (3) ()) | | |
+-^-+ +---------------------------+ +--------------------------------------------+
|
+-------------------------------------------+
|
+---+ +-------------+ +-----------+--------+
| +-with------> ((2 3) (2)) | | |
| 2 | | +---> ((2 3) (2) (3) ()) |
| +-without---> ((3) ()) | | |
+-^-+ +-------------+ +--------------------+
|
+--------------------------------+
|
+---+ +-----+ +------+--------+
| +-with------> (3) | | |
| 3 | | +-----> ((3) ()) |
| +-without---> () | | |
+-^-+ +-----+ +---------------+
|
|
+-+-+
|() |
| | <- base case
+---+

Solving a constrained system of linear equations

I have a system of equations of the form y=Ax+b where y, x and b are n×1 vectors and A is a n×n (symmetric) matrix.
So here is the wrinkle. Not all of x is unknown. Certain rows of x are specified and the corresponding rows of y are unknown. Below is an example
| 10 | | 5 -2 1 | | * | | -1 |
| * | = | -2 2 0 | | 1 | + | 1 |
| 1 | | 1 0 1 | | * | | 2 |
where * designates unknown quantities.
I have built a solver for problems such as the above in Fortran, but I wanted to know if there is a decent robust solver out-there as part of Lapack or MLK for these types of problems?
My solver is based on a sorting matrix called pivot = [1,3,2] which rearranges the x and y vectors according to known and unknown
| 10 | | 5 1 -2 | | * | | -1 |
| 1 | | 1 1 0 | | * | + | 2 |
| * | | -2 0 2 | | 1 | | 1 |
and the solving using a block matrix solution & LU decomposition
! solves a n×n system of equations where k values are known from the 'x' vector
function solve_linear_system(A,b,x_known,y_known,pivot,n,k) result(x)
use lu
integer(c_int),intent(in) :: n, k, pivot(n)
real(c_double),intent(in) :: A(n,n), b(n), x_known(k), y_known(n-k)
real(c_double) :: x(n), y(n), r(n-k), A1(n-k,n-k), A3(n-k,k), b1(n-k)
integer(c_int) :: i, j, u, code, d, indx(n-k)
u = n-k
!store known `x` and `y` values
x(pivot(u+1:n)) = x_known
y(pivot(1:u)) = y_known
!define block matrices
! |y_known| = | A1 A3 | | * | + |b1|
| | * | = | A3` A2 | | x_known | |b2|
A1 = A(pivot(1:u), pivot(1:u))
A3 = A(pivot(1:u), pivot(u+1:n))
b1 = b(pivot(1:u))
!define new rhs vector
r = y_known -matmul(A3, x_known)-b1
% solve `A1*x=r` with LU decomposition from NR book for 'x'
call ludcmp(A1,u,indx,d,code)
call lubksb(A1,u,indx,r)
% store unknown 'x' values (stored into 'r' by 'lubksb')
x(pivot(1:u)) = r
end function
For the example above the solution is
| 10.0 | | 3.5 |
y = | -4.0 | x = | 1.0 |
| 1.0 | | -4.5 |
PS. The linear systems have typically n<=20 equations.
The problem with only unknowns is a linear least squares problem.
Your a-priori knowledge can be introduced with equality-constraints (fixing some variables), transforming it to an linear equality-constrained least squares problem.
There is indeed an algorithm within lapack solving the latter, called xGGLSE.
Here is some overview.
(It also seems, you need to multiply b with -1 in your case to be compatible with the definition)
Edit: On further inspection, i missed the unknowns within y. Ouch. This is bad.
First, i would rewrite your system into a AX=b form where A and b are known. In your example, and provided that i didn't make any mistakes, it would give :
5 0 1 x1 13
A = 2 1 0 X = x2 and b = 3
1 0 1 x3 -1
Then you can use plenty of methods coming from various libraries, like LAPACK or BLAS depending on the properties of your matrix A (positive-definite ,...). As a starting point, i would suggest a simple method with a direct inversion of the matrix A, especially if your matrix is small. There are also many iterative approach ( Jacobi, Gradients, Gauss seidel ...) that you can use for bigger cases.
Edit : An idea to solve it in 2 steps
First step : You can rewrite your system in 2 subsystem that have X and Y as unknows but dimension are equals to the numbers of unknowns in each vector.
The first subsystem in X will be AX = b which can be solved by direct or iterative methods.
Second step : The second system in Y can be directly resolved once you know X cause Y will be expressed in the form Y = A'X + b'
I think this approach is more general.

Intersection ranges (algorithm)

As example I have next arrays:
[100,192]
[235,280]
[129,267]
As intersect arrays we get:
[129,192]
[235,267]
Simple exercise for people but problem for creating algorithm that find second multidim array…
Any language, any ideas..
If somebody do not understand me:
I'll assume you wish to output any range that has 2 or more overlapping intervals.
So the output for [1,5], [2,4], [3,3] will be (only) [2,4].
The basic idea here is to use a sweep-line algorithm.
Split the ranges into start- and end-points.
Sort the points.
Now iterate through the points with a counter variable initialized to 0.
If you get a start-point:
Increase the counter.
If the counter's value is now 2, record that point as the start-point for a range in the output.
If you get an end-point
Decrease the counter.
If the counter's value is 1, record that point as the end-point for a range in the output.
Note:
If a start-point and an end-point have the same value, you'll need to process the end-point first if the counter is 1 and the start-point first if the counter is 2 or greater, otherwise you'll end up with a 0-size range or a 0-size gap between two ranges in the output.
This should be fairly simple to do by having a set of the following structure:
Element
int startCount
int endCount
int value
Then you combine all points with the same value into one such element, setting the counts appropriately.
Running time:
O(n log n)
Example:
Input:
[100, 192]
[235, 280]
[129, 267]
(S for start, E for end)
Points | | 100 | 129 | 192 | 235 | 267 | 280 |
Type | | Start | Start | End | Start | End | End |
Count | 0 | 1 | 2 | 1 | 2 | 1 | 0 |
Output | | | [129, | 192] | [235, | 267] | |
This is python implementation of intersection algorithm. Its computcomputational complexity O(n^2).
a = [[100,192],[235,280],[129,267]]
def get_intersections(diapasons):
intersections = []
for d in diapasons:
for check in diapasons:
if d == check:
continue
if d[0] >= check[0] and d[0] <= check[1]:
right = d[1]
if check[1] < d[1]:
right = check[1]
intersections.append([d[0], right])
return intersections
print get_intersections(a)

Counting the ways to build a wall with two tile sizes [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
You are given a set of blocks to build a panel using 3”×1” and 4.5”×1" blocks.
For structural integrity, the spaces between the blocks must not line up in adjacent rows.
There are 2 ways in which to build a 7.5”×1” panel, 2 ways to build a 7.5”×2” panel, 4 ways to build a 12”×3” panel, and 7958 ways to build a 27”×5” panel. How many different ways are there to build a 48”×10” panel?
This is what I understand so far:
with the blocks 3 x 1 and 4.5 x 1
I've used combination formula to find all possible combinations that the 2 blocks can be arranged in a panel of this size
C = choose --> C(n, k) = n!/r!(n-r)! combination of group n at r at a time
Panel: 7.5 x 1 = 2 ways -->
1 (3 x 1 block) and 1 (4.5 x 1 block) --> Only 2 blocks are used--> 2 C 1 = 2 ways
Panel: 7.5 x 2 = 2 ways
I used combination here as well
1(3 x 1 block) and 1 (4.5 x 1 block) --> 2 C 1 = 2 ways
Panel: 12 x 3 panel = 2 ways -->
2(4.5 x 1 block) and 1(3 x 1 block) --> 3 C 1 = 3 ways
0(4.5 x 1 block) and 4(3 x 1 block) --> 4 C 0 = 1 way
3 ways + 1 way = 4 ways
(This is where I get confused)
Panel 27 x 5 panel = 7958 ways
6(4.5 x 1 block) and 0(3 x 1) --> 6 C 0 = 1 way
4(4.5 x 1 block) and 3(3 x 1 block) --> 7 C 3 = 35 ways
2(4.5 x 1 block) and 6(3 x 1 block) --> 8 C 2 = 28 ways
0(4.5 x 1 block) and 9(3 x 1 block) --> 9 C 0 = 1 way
1 way + 35 ways + 28 ways + 1 way = 65 ways
As you can see here the number of ways is nowhere near 7958. What am I doing wrong here?
Also how would I find how many ways there are to construct a 48 x 10 panel?
Because it's a little difficult to do it by hand especially when trying to find 7958 ways.
How would write a program to calculate an answer for the number of ways for a 7958 panel?
Would it be easier to construct a program to calculate the result? Any help would be greatly appreciated.
I don't think the "choose" function is directly applicable, given your "the spaces between the blocks must not line up in adjacent rows" requirement. I also think this is where your analysis starts breaking down:
Panel: 12 x 3 panel = 2 ways -->
2(4.5 x 1 block) and 1(3 x 1 block)
--> 3 C 1 = 3 ways
0(4.5 x 1 block) and 4(3 x 1 block)
--> 4 C 0 = 1 way
3 ways + 1 way = 4 ways
...let's build some panels (1 | = 1 row, 2 -'s = 1 column):
+---------------------------+
| | | | |
| | | | |
| | | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
Here we see that there are 4 different basic row types, but none of these are valid panels (they all violate the "blocks must not line up" rule). But we can use these row types to create several panels:
+---------------------------+
| | | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
...
But again, none of these are valid. The valid 12x3 panels are:
+---------------------------+
| | | | |
| | | |
| | | | |
+---------------------------+
+---------------------------+
| | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
So there are in fact 4 of them, but in this case it's just a coincidence that it matches up with what you got using the "choose" function. In terms of total panel configurations, there are quite more than 4.
Find all ways to form a single row of the given width. I call this a "row type". Example 12x3: There are 4 row types of width 12: (3 3 3 3), (4.5 4.5 3), (4.5 3 4.5), (3 4.5 4.5). I would represent these as a list of the gaps. Example: (3 6 9), (4.5 9), (4.5 7.5), (3 7.5).
For each of these row types, find which other row types could fit on top of it.
Example:
a. On (3 6 9) fits (4.5 7.5).
b. On (4.5 9) fits (3 7.5).
c: On (4.5 7.5) fits (3 6 9).
d: On (3 7.5) fits (4.5 9).
Enumerate the ways to build stacks of the given height from these rules. Dynamic programming is applicable to this, as at each level, you only need the last row type and the number of ways to get there.
Edit: I just tried this out on my coffee break, and it works. The solution for 48x10 has 15 decimal digits, by the way.
Edit: Here is more detail of the dynamic programming part:
Your rules from step 2 translate to an array of possible neighbours. Each element of the array corresponds to a row type, and holds that row type's possible neighbouring row types' indices.
0: (2)
1: (3)
2: (0)
3: (1)
In the case of 12×3, each row type has only a single possible neighbouring row type, but in general, it can be more.
The dynamic programming starts with a single row, where each row type has exactly one way of appearing:
1 1 1 1
Then, the next row is formed by adding for each row type the number of ways that possible neighbours could have formed on the previous row. In the case of a width of 12, the result is 1 1 1 1 again. At the end, just sum up the last row.
Complexity:
Finding the row types corresponds to enumerating the leaves of a tree; there are about (/ width 3) levels in this tree, so this takes a time of O(2w/3) = O(2w).
Checking whether two row types fit takes time proportional to their length, O(w/3). Building the cross table is proportional to the square of the number of row types. This makes step 2 O(w/3·22w/3) = O(2w).
The dynamic programming takes height times the number of row types times the average number of neighbours (which I estimate to be logarithmic to the number of row types), O(h·2w/3·w/3) = O(2w).
As you see, this is all dominated by the number of row types, which grow exponentially with the width. Fortunately, the constant factors are rather low, so that 48×10 can be solved in a few seconds.
This looks like the type of problem you could solve recursively. Here's a brief outline of an algorithm you could use, with a recursive method that accepts the previous layer and the number of remaining layers as arguments:
Start with the initial number of layers (e.g. 27x5 starts with remainingLayers = 5) and an empty previous layer
Test all possible layouts of the current layer
Try adding a 3x1 in the next available slot in the layer we are building. Check that (a) it doesn't go past the target width (e.g. doesn't go past 27 width in a 27x5) and (b) it doesn't violate the spacing condition given the previous layer
Keep trying to add 3x1s to the current layer until we have built a valid layer that is exactly (e.g.) 27 units wide
If we cannot use a 3x1 in the current slot, remove it and replace with a 4.5x1
Once we have a valid layer, decrement remainingLayers and pass it back into our recursive algorithm along with the layer we have just constructed
Once we reach remainingLayers = 0, we have constructed a valid panel, so increment our counter
The idea is that we build all possible combinations of valid layers. Once we have (in the 27x5 example) 5 valid layers on top of each other, we have constructed a complete valid panel. So the algorithm should find (and thus count) every possible valid panel exactly once.
This is a '2d bin packing' problem. Someone with decent mathematical knowledge will be able to help or you could try a book on computational algorithms. It is known as a "combinatorial NP-hard problem". I don't know what that means but the "hard" part grabs my attention :)
I have had a look at steel cutting prgrams and they mostly use a best guess. In this case though 2 x 4.5" stacked vertically can accommodate 3 x 3" inch stacked horizontally. You could possibly get away with no waste. Gets rather tricky when you have to figure out the best solution --- the one with minimal waste.
Here's a solution in Java, some of the array length checking etc is a little messy but I'm sure you can refine it pretty easily.
In any case, I hope this helps demonstrate how the algorithm works :-)
import java.util.Arrays;
public class Puzzle
{
// Initial solve call
public static int solve(int width, int height)
{
// Double the widths so we can use integers (6x1 and 9x1)
int[] prev = {-1}; // Make sure we don't get any collisions on the first layer
return solve(prev, new int[0], width * 2, height);
}
// Build the current layer recursively given the previous layer and the current layer
private static int solve(int[] prev, int[] current, int width, int remaining)
{
// Check whether we have a valid frame
if(remaining == 0)
return 1;
if(current.length > 0)
{
// Check for overflows
if(current[current.length - 1] > width)
return 0;
// Check for aligned gaps
for(int i = 0; i < prev.length; i++)
if(prev[i] < width)
if(current[current.length - 1] == prev[i])
return 0;
// If we have a complete valid layer
if(current[current.length - 1] == width)
return solve(current, new int[0], width, remaining - 1);
}
// Try adding a 6x1
int total = 0;
int[] newCurrent = Arrays.copyOf(current, current.length + 1);
if(current.length > 0)
newCurrent[newCurrent.length - 1] = current[current.length - 1] + 6;
else
newCurrent[0] = 6;
total += solve(prev, newCurrent, width, remaining);
// Try adding a 9x1
if(current.length > 0)
newCurrent[newCurrent.length - 1] = current[current.length - 1] + 9;
else
newCurrent[0] = 9;
total += solve(prev, newCurrent, width, remaining);
return total;
}
// Main method
public static void main(String[] args)
{
// e.g. 27x5, outputs 7958
System.out.println(Puzzle.solve(27, 5));
}
}

Buckets of Balls, Will one fill if I add another Ball? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I realize the title is a bit odd. But this is a statistics problem that I am trying to figure out, but am stumped. (No no, its not homework, see the bottom for the real explanation)
The premise is simple. You have N buckets. Each bucket can hold H balls. None of the buckets is full. You have D balls already in the buckets, but you don't know where the balls are (you forgot!) You choose a bucket at random to add 1 ball. What is the probability that that bucket will then be full.
Some example possible diagrams, with N = 4, H = 3, D = 4. Each case is just a hypothetical arrangement of the balls. for one of many cases.
Scenario 1: 1 bucket could be filled.
| | | | |
+ - + - + - + - +
| B | | | |
+ - + - + - + - +
| B | B | | B |
+ - + - + - + - +
Scenario 2: 2 buckets could be filled.
| | | | |
+ - + - + - + - +
| | B | B | |
+ - + - + - + - +
| | B | B | |
+ - + - + - + - +
Scenario 3: 0 buckets could be filled.
| | | | |
+ - + - + - + - +
| | | | |
+ - + - + - + - +
| B | B | B | B |
+ - + - + - + - +
The problem is I need a general purpose equation in the form of P = f(N, H, D)
Alright, you've tuned in this far. The reason behind this query on math, is I'm curious in having large battles between units. Each unit could belong to a brigade that contains many units of the same type. however, the battle will progress slowly over time. At each phase of the battle, the state will be saved to the DB. Instead of saving each unit and each health for each unit, I want to save the number of units and the total damage on the brigade. When damage is added to a brigade, the f(N, H, D) is run and returns a % chance that a unit in the brigade is destroyed (all of its HP are used up). This then removes that unit from the brigade decrementing N by 1 and D by H.
Before you launch into too much criticism of the idea. Remember, if you have VAST VAST large armies, this sort of information cannot be efficiently stored in a small DB, and with the limitations of Web, I can't keep the data for all the units in memory at the same time. Anyway, thanks for the thoughts.
I believe this boils down to the probability that the first bucket holds H-1 balls (because your probability is really the probability that the bucket you pick to drop a ball into has H-1 balls.
I'm guessing this should be solvable with combinatorics, then, but that is not my strong point.
As a side note: this is not a statistics problem, but a probability problem.
if you could afford to store for each brigade the number n[h] of units with h hits for each possible h, then the problem becomes straighforward: with probability n[h]/N you select a unit with h hits, and then increment n[h+1] and decrement n[h], or if you've selected h=max-1 you decrement n[h] and N.
If you can't afford the extra memory, a reasonable and tractable choice would be the maximum entropy distribution, see here for example

Resources