Selecting evenly distributed points algorithm - algorithm

Suppose there are 25 points in a line segment, and these points may be unevenly distributed (spatially) as the following figure shows:
My question is how we can select 10 points among these 25 points so that these 10 points can be as spatially evenly distributed as possible. In the idea situation, the selected points should be something like this:
EDIT:
It is true that this question can become more elegant if I can tell the criterion that justify the "even distribution". What I know is my expection for the selected points: if I divide the line segment into 10 equal line segments. I expect there should be one point on each small line segment. Of course it may happen that in some small line segments we cannot find representative points. In that case I will resort to its neighboring small line segment that has representative point. In the next step I will further divide the selected neighboring segment into two parts: if each part has representative points, then the empty representative point problem will be solved. If we cannot find representative point in one of the small line segments, we can further divide it into smaller parts. Or we can resort to the next neighboring line segment.
EDIT:
Using dynamic programming, a possible solution is implemented as follows:
#include <iostream>
#include <vector>
using namespace std;
struct Note
{
int previous_node;
double cost;
};
typedef struct Note Note;
int main()
{
double dis[25] =
{0.0344460805029088, 0.118997681558377, 0.162611735194631,
0.186872604554379, 0.223811939491137, 0.276025076998578,
0.317099480060861, 0.340385726666133, 0.381558457093008,
0.438744359656398, 0.445586200710900, 0.489764395788231,
0.498364051982143, 0.585267750979777, 0.646313010111265,
0.655098003973841, 0.679702676853675, 0.694828622975817,
0.709364830858073, 0.754686681982361, 0.765516788149002,
0.795199901137063, 0.823457828327293, 0.950222048838355, 0.959743958516081};
Note solutions[25];
for(int i=0; i<25; i++)
{
solutions[i].cost = 1000000;
}
solutions[0].cost = 0;
solutions[0].previous_node = 0;
for(int i=0; i<25; i++)
{
for(int j= i-1; j>=0; j--)
{
double tempcost = solutions[j].cost + std::abs(dis[i]-dis[j]-0.1);
if (tempcost<solutions[i].cost)
{
solutions[i].previous_node = j;
solutions[i].cost = tempcost;
}
}
}
vector<int> selected_points_index;
int i= 24;
selected_points_index.push_back(i);
while (solutions[i].previous_node != 0)
{
i = solutions[i].previous_node;
selected_points_index.push_back(i);
}
selected_points_index.push_back(0);
std::reverse(selected_points_index.begin(),selected_points_index.end());
for(int i=0; i<selected_points_index.size(); i++)
cout<<selected_points_index[i]<<endl;
return 0;
}
The result are shown in the following figure, where the selected points are denoted as green:

Until a good, and probably O(n^2) solution comes along, use this approximation:
Divide the range into 10 equal-sized bins. Choose the point in each bin closest to the centre of each bin. Job done.
If you find that any of the bins is empty choose a smaller number of bins and try again.
Without information about the scientific model that you are trying to implement it is difficult (a) to suggest a more appropriate algorithm and/or (b) to justify the computational effort of a more complicated algorithm.

Let {x[i]} be your set of ordered points. I guess what you need to do is to find the subset of 10 points {y[i]} that minimizes \sum{|y[i]-y[i-1]-0.1|} with y[-1] = 0.
Now, if you see the configuration as a strongly connected directed graph, where each node is one of the 25 doubles and the cost for every edge is |y[i]-y[i-1]-0.1|, you should be able to solve the problem in O(n^2 +nlogn) time with the Dijkstra's algorithm.
Another idea, that will probably lead to a better result, is using dynamic programming : if the element x[i] is part of our soltion, the total minimum is the sum of the minimum to get to the x[i] point plus the minimum to get the final point, so you could write a minimum solution for each point, starting from the smallest one, and using for the next one the minimum between his predecessors.
Note that you'll probably have to do some additional work to pick, from the solutions set, the subset of those with 10 points.
EDIT
I've written this in c#:
for (int i = 0; i < 25; i++)
{
for (int j = i-1; j > 0; j--)
{
double tmpcost = solution[j].cost + Math.Abs(arr[i] - arr[j] - 0.1);
if (tmpcost < solution[i].cost)
{
solution[i].previousNode = j;
solution[i].cost = tmpcost;
}
}
}
I've not done a lot of testing, and there may be some problem if the "holes" in the 25 elements are quite wide, leading to solutions that are shorter than 10 elements ... but it's just to give you some ideas to work on :)

You can find approximate solution with Adaptive Non-maximal Suppression (ANMS) algorithm provided the points are weighted. The algorithm selects n best points while keeping them spatially well distributed (most spread across the space).
I guess you can assign point weights based on your distribution criterion - e.g. a distance from uniform lattice of your choice. I think the lattice should have n-1 bins for optimal result.
You can look up following papers discussing the 2D case (the algorithm can be easily realized in 1D):
Turk, Steffen Gauglitz Luca Foschini Matthew, and Tobias Höllerer. "EFFICIENTLY SELECTING SPATIALLY DISTRIBUTED KEYPOINTS FOR VISUAL TRACKING."
Brown, Matthew, Richard Szeliski, and Simon Winder. "Multi-image matching using multi-scale oriented patches." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005.
The second paper is less related to your problem but it describes basic ANMS algorithm. The first papers provides faster solution. I guess both will do in 1D for a moderate amount of points (~10K).

Related

2D House Robber Algorithm

This seems to be a variation of the LeetCode House Robber problem, but I found it significantly harder to tackle:
There are houses laid out on a NxN grid. Each house is known to contain some amount of valuables. The robbers task is to rob as many houses as possible to maximize the amount of loot. However there is a security system in place and if you rob two adjacent houses (to the left, right, above and below) an alarm will go off. Find the maximum loot the robber can rob.
Houses : alignment
10 20 10 0 1 0
20 40 20 => 1 0 1
10 20 10 0 1 0
This alignment results in the maximum of 80.
I've learned how to solve the optimum selection of houses for a single row with dynamic programming from https://shanzi.gitbooks.io/algorithm-notes/problem_solutions/house_robber.html:
public class HouseRobber {
public int rob(int[] nums) {
if (nums.length == 0) return 0;
if (nums.length == 1) return nums[0];
int[] dp = new int[nums.length];
dp[0] = nums[0];
int max = dp[0];
for (int i = 1; i < dp.length; i++) {
dp[i] = nums[i];
// Do not need to check k < i - 3.
for (int j = 2; i - j >= 0 && j <= 3; j++) {
dp[i] = Math.max(dp[i], dp[i - j] + nums[i]);
}
max = Math.max(dp[i], max);
}
return max;
}
}
But once I select one row's optimum selection, it might not align with with the optimum selections for the rows above and below that row. Even if I find the optimum combination for two rows, the next row might have more valuables than the two rows combined and would require another adjustment, and on and on.
This is difficult because there are a lot more variables to consider than houses on a single row and there could also be more than one optimum alignments that give the robber maximum loot (such as the example above.)
I did find what seemed to be a poorly written solution in Python, but since I only understand C-based languages (Java, C#, C++), I couldn't get much out of it. Could anyone help me with a solution or at least some pointers?
Thanks for your time!
I went through the python code you mentioned. The solution using flow looks plain wrong to me. There is no path from source to sink with finite weight. That solution basically colors the grid like a chessboard and chooses either all the black squares or all the white squares. That is not optimal in the following case:
1 500
300 1
1000 300
It is better to choose 1000 and 500 but that solution will choose 300+300+500.
The dynamic programming solution is exponential.
I don't know enough math to understand the LP solution.
Sorry to not answer your question.

Divvying people into rooms by last name?

I often teach large introductory programming classes (400 - 600 students) and when exam time comes around, we often have to split the class up into different rooms in order to make sure everyone has a seat for the exam.
To keep things logistically simple, I usually break the class apart by last name. For example, I might send students with last names A - H to one room, last name I - L to a second room, M - S to a third room, and T - Z to a fourth room.
The challenge in doing this is that the rooms often have wildly different capacities and it can be hard to find a way to segment the class in a way that causes everyone to fit. For example, suppose that the distribution of last names is (for simplicity) the following:
Last name starts with A: 25
Last name starts with B: 150
Last name starts with C: 200
Last name starts with D: 50
Suppose that I have rooms with capacities 350, 50, and 50. A greedy algorithm for finding a room assignment might be to sort the rooms into descending order of capacity, then try to fill in the rooms in that order. This, unfortunately, doesn't always work. For example, in this case, the right option is to put last name A in one room of size 50, last names B - C into the room of size 350, and last name D into another room of size 50. The greedy algorithm would put last names A and B into the 350-person room, then fail to find seats for everyone else.
It's easy to solve this problem by just trying all possible permutations of the room orderings and then running the greedy algorithm on each ordering. This will either find an assignment that works or report that none exists. However, I'm wondering if there is a more efficient way to do this, given that the number of rooms might be between 10 and 20 and checking all permutations might not be feasible.
To summarize, the formal problem statement is the following:
You are given a frequency histogram of the last names of the students in a class, along with a list of rooms and their capacities. Your goal is to divvy up the students by the first letter of their last name so that each room is assigned a contiguous block of letters and does not exceed its capacity.
Is there an efficient algorithm for this, or at least one that is efficient for reasonable room sizes?
EDIT: Many people have asked about the contiguous condition. The rules are
Each room should be assigned at most a block of contiguous letters, and
No letter should be assigned to two or more rooms.
For example, you could not put A - E, H - N, and P - Z into the same room. You could also not put A - C in one room and B - D in another.
Thanks!
It can be solved using some sort of DP solution on [m, 2^n] space, where m is number of letters (26 for english) and n is number of rooms. With m == 26 and n == 20 it will take about 100 MB of space and ~1 sec of time.
Below is solution I have just implemented in C# (it will successfully compile on C++ and Java too, just several minor changes will be needed):
int[] GetAssignments(int[] studentsPerLetter, int[] rooms)
{
int numberOfRooms = rooms.Length;
int numberOfLetters = studentsPerLetter.Length;
int roomSets = 1 << numberOfRooms; // 2 ^ (number of rooms)
int[,] map = new int[numberOfLetters + 1, roomSets];
for (int i = 0; i <= numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
map[i, j] = -2;
map[0, 0] = -1; // starting condition
for (int i = 0; i < numberOfLetters; i++)
for (int j = 0; j < roomSets; j++)
if (map[i, j] > -2)
{
for (int k = 0; k < numberOfRooms; k++)
if ((j & (1 << k)) == 0)
{
// this room is empty yet.
int roomCapacity = rooms[k];
int t = i;
for (; t < numberOfLetters && roomCapacity >= studentsPerLetter[t]; t++)
roomCapacity -= studentsPerLetter[t];
// marking next state as good, also specifying index of just occupied room
// - it will help to construct solution backwards.
map[t, j | (1 << k)] = k;
}
}
// Constructing solution.
int[] res = new int[numberOfLetters];
int lastIndex = numberOfLetters - 1;
for (int j = 0; j < roomSets; j++)
{
int roomMask = j;
while (map[lastIndex + 1, roomMask] > -1)
{
int lastRoom = map[lastIndex + 1, roomMask];
int roomCapacity = rooms[lastRoom];
for (; lastIndex >= 0 && roomCapacity >= studentsPerLetter[lastIndex]; lastIndex--)
{
res[lastIndex] = lastRoom;
roomCapacity -= studentsPerLetter[lastIndex];
}
roomMask ^= 1 << lastRoom; // Remove last room from set.
j = roomSets; // Over outer loop.
}
}
return lastIndex > -1 ? null : res;
}
Example from OP question:
int[] studentsPerLetter = { 25, 150, 200, 50 };
int[] rooms = { 350, 50, 50 };
int[] ans = GetAssignments(studentsPerLetter, rooms);
Answer will be:
2
0
0
1
Which indicates index of room for each of the student's last name letter. If assignment is not possible my solution will return null.
[Edit]
After thousands of auto generated tests my friend has found a bug in code which constructs solution backwards. It does not influence main algo, so fixing this bug will be an exercise to the reader.
The test case that reveals the bug is students = [13,75,21,49,3,12,27,7] and rooms = [6,82,89,6,56]. My solution return no answers, but actually there is an answer. Please note that first part of solution works properly, but answer construction part fails.
This problem is NP-Complete and thus there is no known polynomial time (aka efficient) solution for this (as long as people cannot prove P = NP). You can reduce an instance of knapsack or bin-packing problem to your problem to prove it is NP-complete.
To solve this you can use 0-1 knapsack problem. Here is how:
First pick the biggest classroom size and try to allocate as many group of students you can (using 0-1 knapsack), i.e equal to the size of the room. You are guaranteed not to split a group of student, as this is 0-1 knapsack. Once done, take the next biggest classroom and continue.
(You use any known heuristic to solve 0-1 knapsack problem.)
Here is the reduction --
You need to reduce a general instance of 0-1 knapsack to a specific instance of your problem.
So lets take a general instance of 0-1 knapsack. Lets take a sack whose weight is W and you have x_1, x_2, ... x_n groups and their corresponding weights are w_1, w_2, ... w_n.
Now the reduction --- this general instance is reduced to your problem as follows:
you have one classroom with seating capacity W. Each x_i (i \in (1,n)) is a group of students whose last alphabet begins with i and their number (aka size of group) is w_i.
Now you can prove if there is a solution of 0-1 knapsack problem, your problem has a solution...and the converse....also if there is no solution for 0-1 knapsack, then your problem have no solution, and vice versa.
Please remember the important thing of reduction -- general instance of a known NP-C problem to a specific instance of your problem.
Hope this helps :)
Here is an approach that should work reasonably well, given common assumptions about the distribution of last names by initial. Fill the rooms from smallest capacity to largest as compactly as possible within the constraints, with no backtracking.
It seems reasonable (to me at least) for the largest room to be listed last, as being for "everyone else" not already listed.
Is there any reason to make life so complicated? Why cann't you assign registration numbers to each student and then use the number to allocate them whatever the way you want :) You do not need to write a code, students are happy, everyone is happy.

Given a large database of over 50,000 , How can I quickly search for desired points

I have a database of over 50,000 points. Each point has 3 dimensions. Let's label them [i,j,k]
I wish to look for points in which it is better than another point in some other way.
For example, Object A [10 10 3], and Object B[1 1 4], Object C[1 1 1], Object D[1 1 10]
Then the desired output would be A and D (since C is worser than all of them, and B beats A in dimenson[k] but D beats B in dimension [k])
I've tried some basic comparison algorithms (i.e. if else statements) which do work when I cut down the database size. But with 50,000, it takes more than 10mins to find the desired output, which of course is not a good solution.
Could somebody recommend me a method or two to do this the fastest possible way?
Thanks
EDIT:
Thanks I think I've got it
You can do many optimizations to your code:
{
vector<bool> isinterst(n, true);
for (int i=0; i<n; i++) {
for (int j=0; j<n; j++) {
if (isinterst[i]) {
bool worseelsewhere=false;
for (int k=0; k<d; k++)
{
if (point[i][k]<point[j][k])
{
worseelsewhere=true;
break; //you can exit for loop if worseelsewhere is set to true
}
}
if(worseelsewhere == false)
{
continue; //skip the rest if worseelsewhere is false
}
bool worse=true;
for (int k=0; k<d; k++)
{
if (point[i][k]>point[j][k])
{
worse=false;
break; //you can exit for loop if worse is set to false
}
}
if (worseelsewhere && worse) {
isinterst[i]=false;
//cout << i << " Not desirable " << endl;
}
}
}
}
You're looking for pareto-optimal points. These form a convex hull. That's easiest to see in 2 dimensions. Use an iterative algorithm to determine the pareto-optimal points of the first N points. For N=1, that's just the first point. For N=2, the next point is either dominated by the first (discard 2nd), dominates the 1st (discard 1st), lies above to the left, or below to the right (and so is also pareto-optimal).
You can speed up classification by keeping a simplified upper and lower bound for the convex hull, e.g. just single points {minX, minY, minZ} and {maxX, maxY, maxZ}. If P={x,y,z} is dominated by {minX, minY, minZ} then it is dominated by all pareto-optimal points so far and can be discarded. If P dominates {maxX, maxY, maxZ}, it also dominates all points that were pareto-optimal so far and you can discard all those.
A quick O(log N) initial step is to first sort the collection in X order to find the point with max X, then Y to find the point with max Y, and finally with max Z. Finding the pareto-optimal points in ths N=3 subset is easy, and can be hardcoded. You can then use this set as a first approximation.
A more refined solution is to then sort by X+Y, X+Z, Y+Z and X+Y+Z and find those maxima as well. Again, this produces points which are good initial candidates because they will dominate many other points.
E.g. in your case, sorting by X and sorting by Y would both produce point A; sorting by Z would produce point D, neither dominates the other, and you can then quickly discard B and C.
Without knowing your definition of "better" it's a bit hard to make concrete suggestions here. I note, however, that you appear to working with spatial data. A data structure that is often used when working with spatial data is the R-Tree (http://en.wikipedia.org/wiki/R-tree). This provides an efficient index for multidimensional information.
Perhaps the boost::geometry library has some tools that will assist: http://www.boost.org/doc/libs/1_53_0/libs/geometry/doc/html/geometry/introduction.html

Algorithm to find out all the possible positions

I need an algorithm to find out all the possible positions of a group of pieces in a chessboard. Like finding all the possible combinations of the positions of a number N of pieces.
For example in a chessboard numbered like cartesian coordinate systems any piece would be in a position
(x,y) where 1 <= x <= 8 and 1 <= y <= 8
I'd like to get an algorithm which can calculate for example for 3 pieces all the possible positions of the pieces in the board. But I don't know how can I get them in any order. I can get all the possible positions of a single piece but I don't know how to mix them with more pieces.
for(int i = 0; i<= 8; i++){
for(int j = 0; j<= 8; j++){
System.out.println("Position: x:"+i+", y:"+j);
}
}
How can I get a good algoritm to find all the posible positions of the pieces in a chessboard?
Thanks.
You got 8x8 board, so total of 64 squares.
Populate a list containing these 64 sqaures [let it be list], and find all of the possibilities recursively: Each step will "guess" one point, and invoke the recursve call to find the other points.
Pseudo code:
choose(list,numPieces,sol):
if (sol.length == numPieces): //base clause: print the possible solution
print sol
return
for each point in list:
sol.append(point) //append the point to the end of sol
list.remove(point)
choose(list,numPieces,sol) //recursive call
list.add(point) //clean up environment before next recursive call
sol.removeLast()
invoke with choose(list,numPieces,[]) where list is the pre-populated list with 64 elements, and numPieces is the pieces you are going to place.
Note: This solution assumes pieces are not identical, so [(1,2),(2,1)] and [(2,1),(1,2)] are both good different solutions.
EDIT:
Just a word about complexity, since there are (n^2)!/(n^2-k)! possible solutions for your problem - and you are looking for all of them, any algorithm will suffer from exponential run time, so trying to invoke it with just 10 pieces, will take ~400 years
[In the above notation, n is the width and length of the board, and k is the number of pieces]
You can use a recursive algorithm to generate all possiblities:
void combine(String instr, StringBuffer outstr, int index)
{
for (int i = index; i < instr.length(); i++)
{
outstr.append(instr.charAt(i));
System.out.println(outstr);
combine(instr, outstr, i + 1);
outstr.deleteCharAt(outstr.length() - 1);
}
}
combine("abc", new StringBuffer(), 0);
As I understand you should consider that some firgure may come block some potential position for figures that can reach them on the empty board. I guess it is the most tricky part.
So you should build some set of vertexes (set of board states) that is reached from some single vertex (initial board state).
The first algorithm that comes to my mind:
Pre-conditions:
Order figures in some way to form circle.
Assume initial set of board states (S0) to contain single element which represents inital board state.
Actions
Choose next figure to extend set of possible positions
For each state of board within S(n) walk depth-first all possible movements that new board states and call it F(n) (frame).
Form S(n+1) = S(n) ∪ F(n).
Repeat steps till all frames of updates during whole circle pass will not be empty.
This is kind of mix breath-first and depth-first search

Minimizing time in transit

[Updates at bottom (including solution source code)]
I have a challenging business problem that a computer can help solve.
Along a mountainous region flows a long winding river with strong currents. Along certain parts of the river are plots of environmentally sensitive land suitable for growing a particular type of rare fruit that is in very high demand. Once field laborers harvest the fruit, the clock starts ticking to get the fruit to a processing plant. It's very costly to try and send the fruits upstream or over land or air. By far the most cost effective mechanism to ship them to the plant is downstream in containers powered solely by the river's constant current. We have the capacity to build 10 processing plants and need to locate these along the river to minimize the total time the fruits spend in transit. The fruits can take however long before reaching the nearest downstream plant but that time directly hurts the price at which they can be sold. Effectively, we want to minimize the sum of the distances to the nearest respective downstream plant. A plant can be located as little as 0 meters downstream from a fruit access point.
The question is: In order to maximize profits, how far up the river should we build the 10 processing plants if we have found 32 fruit growing regions, where the regions' distances upstream from the base of the river are (in meters):
10, 40, 90, 160, 250, 360, 490, ... (n^2)*10 ... 9000, 9610, 10320?
[It is hoped that all work going towards solving this problem and towards creating similar problems and usage scenarios can help raise awareness about and generate popular resistance towards the damaging and stifling nature of software/business method patents (to whatever degree those patents might be believed to be legal within a locality).]
UPDATES
Update1: Forgot to add: I believe this question is a special case of this one.
Update2: One algorithm I wrote gives an answer in a fraction of a second, and I believe is rather good (but it's not yet stable across sample values). I'll give more details later, but the short is as follows. Place the plants at equal spacings. Cycle over all the inner plants where at each plant you recalculate its position by testing every location between its two neighbors until the problem is solved within that space (greedy algorithm). So you optimize plant 2 holding 1 and 3 fixed. Then plant 3 holding 2 and 4 fixed... When you reach the end, you cycle back and repeat until you go a full cycle where every processing plant's recalculated position stops varying.. also at the end of each cycle, you try to move processing plants that are crowded next to each other and are all near each others' fruit dumps into a region that has fruit dumps far away. There are many ways to vary the details and hence the exact answer produced. I have other candidate algorithms, but all have glitches. [I'll post code later.] Just as Mike Dunlavey mentioned below, we likely just want "good enough".
To give an idea of what might be a "good enough" result:
10010 total length of travel from 32 locations to plants at
{10,490,1210,1960,2890,4000,5290,6760,8410,9610}
Update3: mhum gave the correct exact solution first but did not (yet) post a program or algorithm, so I wrote one up that yields the same values.
/************************************************************
This program can be compiled and run (eg, on Linux):
$ gcc -std=c99 processing-plants.c -o processing-plants
$ ./processing-plants
************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//a: Data set of values. Add extra large number at the end
int a[]={
10,40,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240,99999
};
//numofa: size of data set
int numofa=sizeof(a)/sizeof(int);
//a2: will hold (pt to) unique data from a and in sorted order.
int *a2;
//max: size of a2
int max;
//num_fixed_loc: at 10 gives the solution for 10 plants
int num_fixed_loc;
//xx: holds index values of a2 from the lowest error winner of each cycle memoized. accessed via memoized offset value. Winner is based off lowest error sum from left boundary upto right ending boundary.
//FIX: to be dynamically sized.
int xx[1000000];
//xx_last: how much of xx has been used up
int xx_last=0;
//SavedBundle: data type to "hold" memoized values needed (total traval distance and plant locations)
typedef struct _SavedBundle {
long e;
int xx_offset;
} SavedBundle;
//sb: (pts to) lookup table of all calculated values memoized
SavedBundle *sb; //holds winning values being memoized
//Sort in increasing order.
int sortfunc (const void *a, const void *b) {
return (*(int *)a - *(int *)b);
}
/****************************
Most interesting code in here
****************************/
long full_memh(int l, int n) {
long e;
long e_min=-1;
int ti;
if (sb[l*max+n].e) {
return sb[l*max+n].e; //convenience passing
}
for (int i=l+1; i<max-1; i++) {
e=0;
//sum first part
for (int j=l+1; j<i; j++) {
e+=a2[j]-a2[l];
}
//sum second part
if (n!=1) //general case, recursively
e+=full_memh(i, n-1);
else //base case, iteratively
for (int j=i+1; j<max-1; j++) {
e+=a2[j]-a2[i];
}
if (e_min==-1) {
e_min=e;
ti=i;
}
if (e<e_min) {
e_min=e;
ti=i;
}
}
sb[l*max+n].e=e_min;
sb[l*max+n].xx_offset=xx_last;
xx[xx_last]=ti; //later add a test or a realloc, etc, if approp
for (int i=0; i<n-1; i++) {
xx[xx_last+(i+1)]=xx[sb[ti*max+(n-1)].xx_offset+i];
}
xx_last+=n;
return e_min;
}
/*************************************************************
Call to calculate and print results for given number of plants
*************************************************************/
int full_memoization(int num_fixed_loc) {
char *str;
long errorsum; //for convenience
//Call recursive workhorse
errorsum=full_memh(0, num_fixed_loc-2);
//Now print
str=(char *) malloc(num_fixed_loc*20+100);
sprintf (str,"\n%4d %6d {%d,",num_fixed_loc-1,errorsum,a2[0]);
for (int i=0; i<num_fixed_loc-2; i++)
sprintf (str+strlen(str),"%d%c",a2[ xx[ sb[0*max+(num_fixed_loc-2)].xx_offset+i ] ], (i<num_fixed_loc-3)?',':'}');
printf ("%s",str);
return 0;
}
/**************************************************
Initialize and call for plant numbers of many sizes
**************************************************/
int main (int x, char **y) {
int t;
int i2;
qsort(a,numofa,sizeof(int),sortfunc);
t=1;
for (int i=1; i<numofa; i++)
if (a[i]!=a[i-1])
t++;
max=t;
i2=1;
a2=(int *)malloc(sizeof(int)*t);
a2[0]=a[0];
for (int i=1; i<numofa; i++)
if (a[i]!=a[i-1]) {
a2[i2++]=a[i];
}
sb = (SavedBundle *)calloc(sizeof(SavedBundle),max*max);
for (int i=3; i<=max; i++) {
full_memoization(i);
}
free(sb);
return 0;
}
Let me give you a simple example of a Metropolis-Hastings algorithm.
Suppose you have a state vector x, and a goodness-of-fit function P(x), which can be any function you care to write.
Suppose you have a random distribution Q that you can use to modify the vector, such as x' = x + N(0, 1) * sigma, where N is a simple normal distribution about 0, and sigma is a standard deviation of your choosing.
p = P(x);
for (/* a lot of iterations */){
// add x to a sample array
// get the next sample
x' = x + N(0,1) * sigma;
p' = P(x');
// if it is better, accept it
if (p' > p){
x = x';
p = p';
}
// if it is not better
else {
// maybe accept it anyway
if (Uniform(0,1) < (p' / p)){
x = x';
p = p';
}
}
}
Usually it is done with a burn-in time of maybe 1000 cycles, after which you start collecting samples. After another maybe 10,000 cycles, the average of the samples is what you take as an answer.
It requires diagnostics and tuning. Typically the samples are plotted, and what you are looking for is a "fuzzy caterpilar" plot that is stable (doesn't move around much) and has a high acceptance rate (very fuzzy). The main parameter you can play with is sigma.
If sigma is too small, the plot will be fuzzy but it will wander around.
If it is too large, the plot will not be fuzzy - it will have horizontal segments.
Often the starting vector x is chosen at random, and often multiple starting vectors are chosen, to see if they end up in the same place.
It is not necessary to vary all components of the state vector x at the same time. You can cycle through them, varying one at a time, or some such method.
Also, if you don't need the diagnostic plot, it may not be necessary to save the samples, but just calculate the average and variance on the fly.
In the applications I'm familiar with, P(x) is a measure of probability, and it is typically in log-space, so it can vary from 0 to negative infinity.
Then to do the "maybe accept" step it is (exp(logp' - logp))
Unless I've made an error, here are exact solutions (obtained through a dynamic programming approach):
N Dist Sites
2 60950 {10,4840}
3 40910 {10,2890,6760}
4 30270 {10,2250,4840,7840}
5 23650 {10,1690,3610,5760,8410}
6 19170 {10,1210,2560,4410,6250,8410}
7 15840 {10,1000,2250,3610,5290,7290,9000}
8 13330 {10,810,1960,3240,4410,5760,7290,9000}
9 11460 {10,810,1690,2890,4000,5290,6760,8410,9610}
10 9850 {10,640,1440,2250,3240,4410,5760,7290,8410,9610}
11 8460 {10,640,1440,2250,3240,4410,5290,6250,7290,8410,9610}
12 7350 {10,490,1210,1960,2890,3610,4410,5290,6250,7290,8410,9610}
13 6470 {10,490,1000,1690,2250,2890,3610,4410,5290,6250,7290,8410,9610}
14 5800 {10,360,810,1440,1960,2560,3240,4000,4840,5760,6760,7840,9000,10240}
15 5190 {10,360,810,1440,1960,2560,3240,4000,4840,5760,6760,7840,9000,9610,10240}
16 4610 {10,360,810,1210,1690,2250,2890,3610,4410,5290,6250,7290,8410,9000,9610,10240}
17 4060 {10,360,810,1210,1690,2250,2890,3610,4410,5290,6250,7290,7840,8410,9000,9610,10240}
18 3550 {10,360,810,1210,1690,2250,2890,3610,4410,5290,6250,6760,7290,7840,8410,9000,9610,10240}
19 3080 {10,360,810,1210,1690,2250,2890,3610,4410,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
20 2640 {10,250,640,1000,1440,1960,2560,3240,4000,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
21 2230 {10,250,640,1000,1440,1960,2560,3240,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
22 1860 {10,250,640,1000,1440,1960,2560,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
23 1520 {10,250,490,810,1210,1690,2250,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
24 1210 {10,250,490,810,1210,1690,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
25 940 {10,250,490,810,1210,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
26 710 {10,160,360,640,1000,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
27 500 {10,160,360,640,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
28 330 {10,160,360,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
29 200 {10,160,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
30 100 {10,90,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
31 30 {10,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}
32 0 {10,40,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240}

Resources