HMM Localization in 2D maze, trouble applying smoothing (backward algorithm)

HMM Localization in 2D maze, trouble applying smoothing (backward algorithm) - algorithm

We use HMM (Hidden Markov Model) to localize a robot in a windy maze with damaged sensors. If he attempts to move in a direction, he will do so with a high probability, and a low chance to accidentally go to either side. If his movement would make him go over an obstacle, he will bounce back to the original tile.
From any given position, he can sense in all four directions. He will notice an obstacle if it is there with high certainty, and see an obstacle when there is none with low certainty.
We have a probability map for all possible places the robot might be in the maze, since he knows what the maze looks like. Initially it all starts evenly distributed.
I have completed the motion and sensing aspect of this and am getting the proper answers, but I am stuck on smoothing (backward algorithm).
Assume that the robot performs the following sequence of actions: senses, moves, senses, moves, senses. This gives us 3 states in our HMM model. Assume that the results I have at each step of the way so far are correct.
I am having a lot of trouble performing smoothing (backward algorithm), given that there are four conditional probabilities (one for each direction).
Assume SP is for smoothing probability, BP is for backward probability
Assume Sk is for a state, and Zk is for an observation at that state. The problem for me is figuring out how to construct my backwards equation given that each Zk is only for a single direction.
I know the algorithm for smoothing is: SP(k) is proportional to BP(k+1) * P(Sk | Z1:k)
Where BP(k+1) is defined as :
if (k == n) return 1 else return Sum(s) of BP(k+1) * P(Zk+1|Sk+1) * P(Sk+1=s | Sk)
This is where I am having my trouble. Mainly in the Conditional Probability portion of this equation. Because each spot has four different directions that it observed! In other words, each state has four different evidence variables as opposed to just one! Do I average these values? Do I do a separate summation for them? How do I account for multiple observations at a given state and properly condense it into this equation which only has room for one conditional probability?
Here is the code I have performing the smoothing:
public static void Smoothing(List<int[]> observations) {
int n = observations.Count; //n is Total length of evidence sequence
int k = n - 1; //k is the state we are trying to smooth. start with n-1
for (; k >= 1; k--) { //Smooth all the way back to the first state
for (int dir = 0; dir < 4; dir++) {
//We must smooth each direction separately
SmoothDirection(dir, observations, k, n);
}
Console.WriteLine($"Smoothing for k = {k}\n");
UpdateMapMotion(mapHistory[k]);
PrintMap();
}
}
public static void SmoothDirection(int dir, List<int[]> observations, int k, int n) {
var alphas = new double[ROWS, COLS];
var normalizer = 0.0;
int row, col;
foreach (var t in map) {
if (t.isObstacle) continue;
row = t.pos.y;
col = t.pos.x;
alphas[row, col] = mapHistory[k][row, col]
* Backwards(k, n, t, dir, observations, moves[^(n - k)]);
normalizer += alphas[row, col];
}
UpdateHistory(k, alphas, normalizer);
}
public static void UpdateHistory(int index, double[,] alphas, double normalizer) {
for (int r = 0; r < ROWS; r++) {
for (int c = 0; c < COLS; c++) {
mapHistory[index][r, c] = alphas[r, c] / normalizer;
}
}
}
public static double Backwards(int k, int n, Tile t, int dir, List<int[]> observations, int moveDir) {
if (k == n) return 1;
double p = 0;
var nextStates = GetPossibleNextStates(t, moveDir);
foreach (var s in nextStates) {
p += Cond_Prob(s.hasObstacle[dir], observations[^(n - k)][dir] == 1) * Trans_Prob(t, s, moveDir)
* Backwards(k+1, n, s, dir, observations, moves[^(n - k)]);
}
return p;
}
public static List<Tile> GetPossibleNextStates(Tile t, int direction) {
var tiles = new List<Tile>(); //Next States
var perpDirs = GetPerpendicularDir(direction); //Perpendicular Directions
//If obstacle in front of Tile t or on the sides, Tile t is a possible next state.
if (t.hasObstacle[direction] || t.hasObstacle[perpDirs[0]] || t.hasObstacle[perpDirs[1]])
tiles.Add(t);
//If there is no obstacle in front of Tile t, then that tile is a possible next state.
if (!t.hasObstacle[direction])
tiles.Add(GetTileAtPos(t.pos + directions[direction]));
//If there are no obstacles on the sides of Tile t, then those are possible next states.
foreach (var dir in perpDirs) {
if (!t.hasObstacle[dir])
tiles.Add(GetTileAtPos(t.pos + directions[dir]));
}
return tiles;
}
TL;DR : How do I perform smoothing (backward algorithm) in a Hidden Markov Model when there are 4 evidences at each state as opposed to just 1?

SOLVED!
It was actually rather much more simple than I imagined.
I don't actually need to each iteration separately in each direction.
I just need to replace the Cond_Prob() function with Joint_Cond_Prob() which finds the joint probability of all directional observations at a given state.
So P(Zk|Sk) is actually P(Zk1:Zk4|Sk) which is just P(Zk1|Sk)P(Zk2|Sk)P(Zk3|Sk)P(Zk4|Sk)

Related

Find largest circle not overlapping with others using genetic algorithm

I'm using GA, so I took example from this page (http://www.ai-junkie.com/ga/intro/gat3.html) and tried to do on my own.
The problem is, it doesn't work. For example, maximum fitness does not always grow in the next generation, but becomes smallest. Also, after some number of generations, it just stops getting better. For example, in first 100 generations, it found the largest circle with radius 104. And in next 900 largest radius is 107. And after drawing it, I see that it can grow much more.
Here is my code connected with GA. I leave out generating random circles, decoding and drawing.
private Genome ChooseParent(Genome[] population, Random r)
{
double sumFitness = 0;
double maxFitness = 0;
for (int i = 0; i < population.Length; i++)
{
sumFitness += population[i].fitness;
if (i == 0 || maxFitness < population[i].fitness)
{
maxFitness = population[i].fitness;
}
}
sumFitness = population.Length * maxFitness - sumFitness;
double randNum = r.NextDouble() *sumFitness;
double acumulatedSum = 0;
for(int i=0;i<population.Length;i++)
{
acumulatedSum += population[i].fitness;
if(randNum<acumulatedSum)
{
return population[i];
}
}
return population[0];
}
private void Crossover(Genome parent1, Genome parent2, Genome child1, Genome child2, Random r)
{
double d=r.NextDouble();
if(d>this.crossoverRate || child1.Equals(child2))
{
for (int i = 0; i < parent1.bitNum; i++)
{
child1.bit[i] = parent1.bit[i];
child2.bit[i] = parent2.bit[i];
}
}
else
{
int cp = r.Next(parent1.bitNum - 1);
for (int i = 0; i < cp; i++)
{
child1.bit[i] = parent1.bit[i];
child2.bit[i] = parent2.bit[i];
}
for (int i = cp; i < parent1.bitNum; i++)
{
child1.bit[i] = parent2.bit[i];
child2.bit[i] = parent1.bit[i];
}
}
}
private void Mutation(Genome child, Random r)
{
for(int i=0;i<child.bitNum;i++)
{
if(r.NextDouble()<=this.mutationRate)
{
child.bit[i] = (byte)(1 - child.bit[i]);
}
}
}
public void Run()
{
for(int generation=0;generation<1000;generation++)
{
CalculateFitness(population);
System.Diagnostics.Debug.WriteLine(maxFitness);
population = population.OrderByDescending(x => x).ToArray();
//ELITIZM
Copy(population[0], newpopulation[0]);
Copy(population[1], newpopulation[1]);
for(int i=1;i<this.populationSize/2;i++)
{
Genome parent1 = ChooseParent(population, r);
Genome parent2 = ChooseParent(population, r);
Genome child1 = newpopulation[2 * i];
Genome child2 = newpopulation[2 * i + 1];
Crossover(parent1, parent2, child1, child2, r);
Mutation(child1, r);
Mutation(child2, r);
}
Genome[] tmp = population;
population = newpopulation;
newpopulation = tmp;
DekodePopulation(population); //decoding and fitness calculation for each member of population
}
}
If someone can point on potential problem that caused such behaviour and ways to fix it, I'll be grateful.

Welcome to the world of genetic algorithms!
I'll go through your issues and suggest a potential problem. Here we go:
maximum fitness does not always grow in the next generation, but becomes smallest - You probably meant smaller. This is weird since you employed elitism, so each generation's best individual should be at least as good as in the previous one. I suggest you check your code for mistakes because this really should not happen. However, the fitness does not need to always grow. It is impossible to achieve this in GA - it's a stochastic algorithm, working with randomness - suppose that, by chance, no mutation nor crossover happens in a generation - then the fitness cannot improve to the next generation since there is no change.
after some number of generations, it just stops getting better. For example, in first 100 generations, it found the largest circle with radius 104. And in next 900 largest radius is 107. And after drawing it, I see that it can grow much more. - this is (probably) a sign of a phenomenon called premature convergence and it's, unfortunately, a "normal" thing in genetic algorithm. Premature convergence is a situation when the whole population converges to a single solution or to a set of solutions which are near each other and which is/are sub-optimal (i.e. it is not the best possible soluion). When this happens, the GA has a very hard time escaping this local optimum. You can try to tweak the parameters, especially the mutation probability, to force more exploration.
Also, another very important thing that can cause problems is the encoding, i.e. how is the bit string mapped to the circle. If the encoding is much too indirect, it can lead to poor performance of the GA. GAs work when there are some building blocks in the genotype which can be exchanged between among the population. If there are no such blocks, the performance of a GA is usually going to be poor.

I have implemented this exercise and achieved good results. Here is the link:
https://github.com/ManhTruongDang/ai-junkie
Hope this can be of use to you.

The Maximum Volume of Trapped Rain Water in 3D

A classic algorithm question in 2D version is typically described as
Given n non-negative integers representing an elevation map where the width of each bar is 1, compute how much water it is able to trap after raining.
For example, Given the input
[0,1,0,2,1,0,1,3,2,1,2,1]
the return value would be
6
The algorithm that I used to solve the above 2D problem is
int trapWaterVolume2D(vector<int> A) {
int n = A.size();
vector<int> leftmost(n, 0), rightmost(n, 0);
//left exclusive scan, O(n), the highest bar to the left each point
int leftMaxSoFar = 0;
for (int i = 0; i < n; i++){
leftmost[i] = leftMaxSoFar;
if (A[i] > leftMaxSoFar) leftMaxSoFar = A[i];
}
//right exclusive scan, O(n), the highest bar to the right each point
int rightMaxSoFar = 0;
for (int i = n - 1; i >= 0; i--){
rightmost[i] = rightMaxSoFar;
if (A[i] > rightMaxSoFar) rightMaxSoFar = A[i];
}
// Summation, O(n)
int vol = 0;
for (int i = 0; i < n; i++){
vol += max(0, min(leftmost[i], rightmost[i]) - A[i]);
}
return vol;
}
My Question is how to make the above algorithm extensible to the 3D version of the problem, to compute the maximum of water trapped in real-world 3D terrain. i.e. To implement
int trapWaterVolume3D(vector<vector<int> > A);
Sample graph:
We know the elevation at each (x, y) point and the goal is to compute the maximum volume of water that can be trapped in the shape. Any thoughts and references are welcome.

For each point on the terrain consider all paths from that point to the border of the terrain. The level of water would be the minimum of the maximum heights of the points of those paths. To find it we need to perform a slightly modified Dijkstra's algorithm, filling the water level matrix starting from the border.
For every point on the border set the water level to the point height
For every point not on the border set the water level to infinity
Put every point on the border into the set of active points
While the set of active points is not empty:
Select the active point P with minimum level
Remove P from the set of active points
For every point Q adjacent to P:
Level(Q) = max(Height(Q), min(Level(Q), Level(P)))
If Level(Q) was changed:
Add Q to the set of active points

user3290797's "slightly modified Dijkstra algorithm" is closer to Prim's algorithm than Dijkstra's. In minimum spanning tree terms, we prepare a graph with one vertex per tile, one vertex for the outside, and edges with weights equal to the maximum height of their two adjoining tiles (the outside has height "minus infinity").
Given a path in this graph to the outside vertex, the maximum weight of an edge in the path is the height that the water has to reach in order to escape along that path. The relevant property of a minimum spanning tree is that, for every pair of vertices, the maximum weight of an edge in the path in the spanning tree is the minimum possible among all paths between those vertices. The minimum spanning tree thus describes the most economical escape paths for water, and the water heights can be extracted in linear time with one traversal.
As a bonus, since the graph is planar, there's a linear-time algorithm for computing the minimum spanning tree, consisting of alternating Boruvka passes and simplifications. This improves on the O(n log n) running time of Prim.

This problem can be solved using the Priority-Flood algorithm. It's been discovered and published a number of times over the past few decades (and again by other people answering this question), though the specific variant you're looking for is not, to my knowledge, in the literature.
You can find a review paper of the algorithm and its variants here. Since that paper was published an even faster variant has been discovered (link), as well as methods to perform this calculation on datasets of trillions of cells (link). A method for selectively breaching low/narrow divides is discussed here. Contact me if you'd like copies of any of these papers.
I have a repository here with many of the above variants; additional implementations can be found here.
A simple script to calculate volume using the RichDEM library is as follows:
#include "richdem/common/version.hpp"
#include "richdem/common/router.hpp"
#include "richdem/depressions/Lindsay2016.hpp"
#include "richdem/common/Array2D.hpp"
/**
#brief Calculates the volume of depressions in a DEM
#author Richard Barnes (rbarnes#umn.edu)
Priority-Flood starts on the edges of the DEM and then works its way inwards
using a priority queue to determine the lowest cell which has a path to the
edge. The neighbours of this cell are added to the priority queue if they
are higher. If they are lower, then they are members of a depression and the
elevation of the flooding minus the elevation of the DEM times the cell area
is the flooded volume of the cell. The cell is flooded, total volume
tracked, and the neighbors are then added to a "depressions" queue which is
used to flood depressions. Cells which are higher than a depression being
filled are added to the priority queue. In this way, depressions are filled
without incurring the expense of the priority queue.
#param[in,out] &elevations A grid of cell elevations
#pre
1. **elevations** contains the elevations of every cell or a value _NoData_
for cells not part of the DEM. Note that the _NoData_ value is assumed to
be a negative number less than any actual data value.
#return
Returns the total volume of the flooded depressions.
#correctness
The correctness of this command is determined by inspection. (TODO)
*/
template <class elev_t>
double improved_priority_flood_volume(const Array2D<elev_t> &elevations){
GridCellZ_pq<elev_t> open;
std::queue<GridCellZ<elev_t> > pit;
uint64_t processed_cells = 0;
uint64_t pitc = 0;
ProgressBar progress;
std::cerr<<"\nPriority-Flood (Improved) Volume"<<std::endl;
std::cerr<<"\nC Barnes, R., Lehman, C., Mulla, D., 2014. Priority-flood: An optimal depression-filling and watershed-labeling algorithm for digital elevation models. Computers & Geosciences 62, 117–127. doi:10.1016/j.cageo.2013.04.024"<<std::endl;
std::cerr<<"p Setting up boolean flood array matrix..."<<std::endl;
//Used to keep track of which cells have already been considered
Array2D<int8_t> closed(elevations.width(),elevations.height(),false);
std::cerr<<"The priority queue will require approximately "
<<(elevations.width()*2+elevations.height()*2)*((long)sizeof(GridCellZ<elev_t>))/1024/1024
<<"MB of RAM."
<<std::endl;
std::cerr<<"p Adding cells to the priority queue..."<<std::endl;
//Add all cells on the edge of the DEM to the priority queue
for(int x=0;x<elevations.width();x++){
open.emplace(x,0,elevations(x,0) );
open.emplace(x,elevations.height()-1,elevations(x,elevations.height()-1) );
closed(x,0)=true;
closed(x,elevations.height()-1)=true;
}
for(int y=1;y<elevations.height()-1;y++){
open.emplace(0,y,elevations(0,y) );
open.emplace(elevations.width()-1,y,elevations(elevations.width()-1,y) );
closed(0,y)=true;
closed(elevations.width()-1,y)=true;
}
double volume = 0;
std::cerr<<"p Performing the improved Priority-Flood..."<<std::endl;
progress.start( elevations.size() );
while(open.size()>0 || pit.size()>0){
GridCellZ<elev_t> c;
if(pit.size()>0){
c=pit.front();
pit.pop();
} else {
c=open.top();
open.pop();
}
processed_cells++;
for(int n=1;n<=8;n++){
int nx=c.x+dx[n];
int ny=c.y+dy[n];
if(!elevations.inGrid(nx,ny)) continue;
if(closed(nx,ny))
continue;
closed(nx,ny)=true;
if(elevations(nx,ny)<=c.z){
if(elevations(nx,ny)<c.z){
++pitc;
volume += (c.z-elevations(nx,ny))*std::abs(elevations.getCellArea());
}
pit.emplace(nx,ny,c.z);
} else
open.emplace(nx,ny,elevations(nx,ny));
}
progress.update(processed_cells);
}
std::cerr<<"t Succeeded in "<<std::fixed<<std::setprecision(1)<<progress.stop()<<" s"<<std::endl;
std::cerr<<"m Cells processed = "<<processed_cells<<std::endl;
std::cerr<<"m Cells in pits = " <<pitc <<std::endl;
return volume;
}
template<class T>
int PerformAlgorithm(std::string analysis, Array2D<T> elevations){
elevations.loadData();
std::cout<<"Volume: "<<improved_priority_flood_volume(elevations)<<std::endl;
return 0;
}
int main(int argc, char **argv){
std::string analysis = PrintRichdemHeader(argc,argv);
if(argc!=2){
std::cerr<<argv[0]<<" <Input>"<<std::endl;
return -1;
}
return PerformAlgorithm(argv[1],analysis);
}
It should be straight-forward to adapt this to whatever 2d array format you are using
In pseudocode, the following is equivalent to the foregoing:
Let PQ be a priority-queue which always pops the cell of lowest elevation
Let Closed be a boolean array initially set to False
Let Volume = 0
Add all the border cells to PQ.
For each border cell, set the cell's entry in Closed to True.
While PQ is not empty:
Select the top cell from PQ, call it C.
Pop the top cell from PQ.
For each neighbor N of C:
If Closed(N):
Continue
If Elevation(N)<Elevation(C):
Volume += (Elevation(C)-Elevation(N))*Area
Add N to PQ, but with Elevation(C)
Else:
Add N to PQ with Elevation(N)
Set Closed(N)=True

This problem is very close to the construction of the morphological watershed of a grayscale image.
One approach is as follows (flooding process):
sort all pixels by increasing elevation.
work incrementally, by increasing elevations, assigning labels to the pixels per catchment basin.
for a new elevation level, you need to label a new set of pixels:
Some have no labeled
neighbor, they form a local minimum configuration and begin a new catchment basin.
Some have only neighbors with the same label, they can be labeled similarly (they extend a catchment basin).
Some have neighbors with different labels. They do not belong to a specific catchment basin and they define the watershed lines.
You will need to enhance the standard watershed algorithm to be able to compute the volume of water. You can do that by determining the maximum water level in each basin and deduce the ground height on every pixel. The water level in a basin is given by the elevation of the lowest watershed pixel around it.
You can act every time you discover a watershed pixel: if a neighboring basin has not been assigned a level yet, that basin can stand the current level without leaking.

In order to accomplish tapping water problem in 3D i.e., to calculate the maximum volume of trapped rain water you can do something like this:
#include<bits/stdc++.h>
using namespace std;
#define MAX 10
int new2d[MAX][MAX];
int dp[MAX][MAX],visited[MAX][MAX];
int dx[] = {1,0,-1,0};
int dy[] = {0,-1,0,1};
int boundedBy(int i,int j,int k,int in11,int in22)
{
if(i<0 || j<0 || i>=in11 || j>=in22)
return 0;
if(new2d[i][j]>k)
return new2d[i][j];
if(visited[i][j]) return INT_MAX;
visited[i][j] = 1;
int r = INT_MAX;
for(int dir = 0 ; dir<4 ; dir++)
{
int nx = i + dx[dir];
int ny = j + dy[dir];
r = min(r,boundedBy(nx,ny,k,in11,in22));
}
return r;
}
void mark(int i,int j,int k,int in1,int in2)
{
if(i<0 || j<0 || i>=in1 || j>=in2)
return;
if(new2d[i][j]>=k)
return;
if(visited[i][j]) return ;
visited[i][j] = 1;
for(int dir = 0;dir<4;dir++)
{
int nx = i + dx[dir];
int ny = j + dy[dir];
mark(nx,ny,k,in1,in2);
}
dp[i][j] = max(dp[i][j],k);
}
struct node
{
int i,j,key;
node(int x,int y,int k)
{
i = x;
j = y;
key = k;
}
};
bool compare(node a,node b)
{
return a.key>b.key;
}
vector<node> store;
int getData(int input1, int input2, int input3[])
{
int row=input1;
int col=input2;
int temp=0;
int count=0;
for(int i=0;i<row;i++)
{
for(int j=0;j<col;j++)
{
if(count==(col*row))
break;
new2d[i][j]=input3[count];
count++;
}
}
store.clear();
for(int i = 0;i<input1;i++)
{
for(int j = 0;j<input2;j++)
{
store.push_back(node(i,j,new2d[i][j]));
}
}
memset(dp,0,sizeof(dp));
sort(store.begin(),store.end(),compare);
for(int i = 0;i<store.size();i++)
{
memset(visited,0,sizeof(visited));
int aux = boundedBy(store[i].i,store[i].j,store[i].key,input1,input2);
if(aux>store[i].key)
{
memset(visited,0,sizeof(visited));
mark(store[i].i,store[i].j,aux,input1,input2);
}
}
long long result =0 ;
for(int i = 0;i<input1;i++)
{
for(int j = 0;j<input2;j++)
{
result = result + max(0,dp[i][j]-new2d[i][j]);
}
}
return result;
}
int main()
{
cin.sync_with_stdio(false);
cout.sync_with_stdio(false);
int n,m;
cin>>n>>m;
int inp3[n*m];
store.clear();
for(int j = 0;j<n*m;j++)
{
cin>>inp3[j];
}
int k = getData(n,m,inp3);
cout<<k;
return 0;
}

class Solution(object):
def trapRainWater(self, heightMap):
"""
:type heightMap: List[List[int]]
:rtype: int
"""
m = len(heightMap)
if m == 0:
return 0
n = len(heightMap[0])
if n == 0:
return 0
visited = [[False for i in range(n)] for j in range(m)]
from Queue import PriorityQueue
q = PriorityQueue()
for i in range(m):
visited[i][0] = True
q.put([heightMap[i][0],i,0])
visited[i][n-1] = True
q.put([heightMap[i][n-1],i,n-1])
for j in range(1, n-1):
visited[0][j] = True
q.put([heightMap[0][j],0,j])
visited[m-1][j] = True
q.put([heightMap[m-1][j],m-1,j])
S = 0
while not q.empty():
cell = q.get()
for (i, j) in [(1,0), (-1,0), (0,1), (0,-1)]:
x = cell[1] + i
y = cell[2] + j
if x in range(m) and y in range(n) and not visited[x][y]:
S += max(0, cell[0] - heightMap[x][y]) # how much water at the cell
q.put([max(heightMap[x][y],cell[0]),x,y])
visited[x][y] = True
return S

Here is the simple code for the same-
#include<iostream>
using namespace std;
int main()
{
int n,count=0,a[100];
cin>>n;
for(int i=0;i<n;i++)
{
cin>>a[i];
}
for(int i=1;i<n-1;i++)
{
///computing left most largest and Right most largest element of array;
int leftmax=0;
int rightmax=0;
///left most largest
for(int j=i-1;j>=1;j--)
{
if(a[j]>leftmax)
{
leftmax=a[j];
}
}
///rightmost largest
for(int k=i+1;k<=n-1;k++)
{
if(a[k]>rightmax)
{
rightmax=a[k];
}
}
///computing hight of the water contained-
int x=(min(rightmax,leftmax)-a[i]);
if(x>0)
{
count=count+x;
}
}
cout<<count;
return 0;
}

Markov Decision Process: value iteration, how does it work?

I've been reading a lot about Markov Decision Processes (using value iteration) lately but I simply can't get my head around them. I've found a lot of resources on the Internet / books, but they all use mathematical formulas that are way too complex for my competencies.
Since this is my first year at college, I've found that the explanations and formulas provided on the web use notions / terms that are way too complicated for me and they assume that the reader knows certain things that I've simply never heard of.
I want to use it on a 2D grid (filled with walls(unattainable), coins(desirable) and enemies that move(which must be avoided at all costs)). The whole goal is to collect all the coins without touching the enemies, and I want to create an AI for the main player using a Markov Decision Process (MDP). Here is how it partially looks like (note that the game-related aspect is not so much of a concern here. I just really want to understand MDPs in general):
From what I understand, a rude simplification of MDPs is that they can create a grid which holds in which direction we need to go (kind of a grid of "arrows" pointing where we need to go, starting at a certain position on the grid) to get to certain goals and avoid certain obstacles. Specific to my situation, that would mean that it allows the player to know in which direction to go to collect the coins and avoid the enemies.
Now, using the MDP terms, it would mean that it creates a collection of states(the grid) which holds certain policies(the action to take -> up, down, right, left) for a certain state(a position on the grid). The policies are determined by the "utility" values of each state, which themselves are calculated by evaluating how much getting there would be beneficial in the short and long term.
Is this correct? Or am I completely on the wrong track?
I'd at least like to know what the variables from the following equation represent in my situation:
(taken from the book "Artificial Intelligence - A Modern Approach" from Russell & Norvig)
I know that s would be a list of all the squares from the grid, a would be a specific action (up / down / right / left), but what about the rest?
How would the reward and utility functions be implemented?
It would be really great if someone knew a simple link which shows pseudo-code to implement a basic version with similarities to my situation in a very slow way, because I don't even know where to start here.
Thank you for your precious time.
(Note: feel free to add / remove tags or tell me in the comments if I should give more details about something or anything like that.)

Yes, the mathematical notation can make it seem much more complicated than it is. Really, it is a very simple idea. I have a implemented a value iteration demo applet that you can play with to get a better idea.
Basically, lets say you have a 2D grid with a robot in it. The robot can try to move North, South, East, West (those are the actions a) but, because its left wheel is slippery, when it tries to move North there is only a .9 probability that it will end up at the square North of it while there is a .1 probability that it will end up at the square West of it (similarly for the other 3 actions). These probabilities are captured by the T() function. Namely, T(s,A,s') will look like:
s A s' T //x=0,y=0 is at the top-left of the screen
x,y North x,y+1 .9 //we do move north
x,y North x-1,y .1 //wheels slipped, so we move West
x,y East x+1,y .9
x,y East x,y-1 .1
x,y South x,y+1 .9
x,y South x-1,y .1
x,y West x-1,y .9
x,y West x,y+1 .1
You then set the Reward to be 0 for all states, but 100 for the goal state, that is, the location you want the robot to get to.
What value-iteration does is its starts by giving a Utility of 100 to the goal state and 0 to all the other states. Then on the first iteration this 100 of utility gets distributed back 1-step from the goal, so all states that can get to the goal state in 1 step (all 4 squares right next to it) will get some utility. Namely, they will get a Utility equal to the probability that from that state we can get to the goal stated. We then continue iterating, at each step we move the utility back 1 more step away from the goal.
In the example above, say you start with R(5,5)= 100 and R(.) = 0 for all other states. So the goal is to get to 5,5.
On the first iteration we set
R(5,6) = gamma * (.9 * 100) + gamma * (.1 * 100)
because on 5,6 if you go North there is a .9 probability of ending up at 5,5, while if you go West there is a .1 probability of ending up at 5,5.
Similarly for (5,4), (4,5), (6,5).
All other states remain with U = 0 after the first iteration of value iteration.

Not a complete answer, but a clarifying remark.
The state is not a single cell. The state contains the information what is in each cell for all concerned cells at once. This means one state element contains the information which cells are solid and which are empty; which ones contain monsters; where are coins; where is the player.
Maybe you could use a map from each cell to its content as state. This does ignore the movement of monsters and player, which are probably very important, too.
The details depend on how you want to model your problem (deciding what belongs to the state and in which form).
Then a policy maps each state to an action like left, right, jump, etc.
First you must understand the problem that is expressed by a MDP before thinking about how algorithms like value iteration work.

I would recommend using Q-learning for your implementation.
Maybe you can use this post I wrote as an inspiration. This is a Q-learning demo with Java source code. This demo is a map with 6 fields and the AI learns where it should go from every state to get to the reward.
Q-learning is a technique for letting the AI learn by itself by giving it reward or punishment.
This example shows the Q-learning used for path finding. A robot learns where it should go from any state.
The robot starts at a random place, it keeps memory of the score while it explores the area, whenever it reaches the goal, we repeat with a new random start. After enough repetitions the score values will be stationary (convergence).
In this example the action outcome is deterministic (transition probability is 1) and the action selection is random. The score values are calculated by the Q-learning algorithm Q(s,a).
The image shows the states (A,B,C,D,E,F), possible actions from the states and the reward given.
Result Q*(s,a)
Policy Π*(s)
Qlearning.java
import java.text.DecimalFormat;
import java.util.Random;
/**
* #author Kunuk Nykjaer
*/
public class Qlearning {
final DecimalFormat df = new DecimalFormat("#.##");
// path finding
final double alpha = 0.1;
final double gamma = 0.9;
// states A,B,C,D,E,F
// e.g. from A we can go to B or D
// from C we can only go to C
// C is goal state, reward 100 when B->C or F->C
//
// _______
// |A|B|C|
// |_____|
// |D|E|F|
// |_____|
//
final int stateA = 0;
final int stateB = 1;
final int stateC = 2;
final int stateD = 3;
final int stateE = 4;
final int stateF = 5;
final int statesCount = 6;
final int[] states = new int[]{stateA,stateB,stateC,stateD,stateE,stateF};
// http://en.wikipedia.org/wiki/Q-learning
// http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning.htm
// Q(s,a)= Q(s,a) + alpha * (R(s,a) + gamma * Max(next state, all actions) - Q(s,a))
int[][] R = new int[statesCount][statesCount]; // reward lookup
double[][] Q = new double[statesCount][statesCount]; // Q learning
int[] actionsFromA = new int[] { stateB, stateD };
int[] actionsFromB = new int[] { stateA, stateC, stateE };
int[] actionsFromC = new int[] { stateC };
int[] actionsFromD = new int[] { stateA, stateE };
int[] actionsFromE = new int[] { stateB, stateD, stateF };
int[] actionsFromF = new int[] { stateC, stateE };
int[][] actions = new int[][] { actionsFromA, actionsFromB, actionsFromC,
actionsFromD, actionsFromE, actionsFromF };
String[] stateNames = new String[] { "A", "B", "C", "D", "E", "F" };
public Qlearning() {
init();
}
public void init() {
R[stateB][stateC] = 100; // from b to c
R[stateF][stateC] = 100; // from f to c
}
public static void main(String[] args) {
long BEGIN = System.currentTimeMillis();
Qlearning obj = new Qlearning();
obj.run();
obj.printResult();
obj.showPolicy();
long END = System.currentTimeMillis();
System.out.println("Time: " + (END - BEGIN) / 1000.0 + " sec.");
}
void run() {
/*
1. Set parameter , and environment reward matrix R
2. Initialize matrix Q as zero matrix
3. For each episode: Select random initial state
Do while not reach goal state o
Select one among all possible actions for the current state o
Using this possible action, consider to go to the next state o
Get maximum Q value of this next state based on all possible actions o
Compute o Set the next state as the current state
*/
// For each episode
Random rand = new Random();
for (int i = 0; i < 1000; i++) { // train episodes
// Select random initial state
int state = rand.nextInt(statesCount);
while (state != stateC) // goal state
{
// Select one among all possible actions for the current state
int[] actionsFromState = actions[state];
// Selection strategy is random in this example
int index = rand.nextInt(actionsFromState.length);
int action = actionsFromState[index];
// Action outcome is set to deterministic in this example
// Transition probability is 1
int nextState = action; // data structure
// Using this possible action, consider to go to the next state
double q = Q(state, action);
double maxQ = maxQ(nextState);
int r = R(state, action);
double value = q + alpha * (r + gamma * maxQ - q);
setQ(state, action, value);
// Set the next state as the current state
state = nextState;
}
}
}
double maxQ(int s) {
int[] actionsFromState = actions[s];
double maxValue = Double.MIN_VALUE;
for (int i = 0; i < actionsFromState.length; i++) {
int nextState = actionsFromState[i];
double value = Q[s][nextState];
if (value > maxValue)
maxValue = value;
}
return maxValue;
}
// get policy from state
int policy(int state) {
int[] actionsFromState = actions[state];
double maxValue = Double.MIN_VALUE;
int policyGotoState = state; // default goto self if not found
for (int i = 0; i < actionsFromState.length; i++) {
int nextState = actionsFromState[i];
double value = Q[state][nextState];
if (value > maxValue) {
maxValue = value;
policyGotoState = nextState;
}
}
return policyGotoState;
}
double Q(int s, int a) {
return Q[s][a];
}
void setQ(int s, int a, double value) {
Q[s][a] = value;
}
int R(int s, int a) {
return R[s][a];
}
void printResult() {
System.out.println("Print result");
for (int i = 0; i < Q.length; i++) {
System.out.print("out from " + stateNames[i] + ": ");
for (int j = 0; j < Q[i].length; j++) {
System.out.print(df.format(Q[i][j]) + " ");
}
System.out.println();
}
}
// policy is maxQ(states)
void showPolicy() {
System.out.println("\nshowPolicy");
for (int i = 0; i < states.length; i++) {
int from = states[i];
int to = policy(from);
System.out.println("from "+stateNames[from]+" goto "+stateNames[to]);
}
}
}
Print result
out from A: 0 90 0 72,9 0 0
out from B: 81 0 100 0 81 0
out from C: 0 0 0 0 0 0
out from D: 81 0 0 0 81 0
out from E: 0 90 0 72,9 0 90
out from F: 0 0 100 0 81 0
showPolicy
from a goto B
from b goto C
from c goto C
from d goto A
from e goto B
from f goto C
Time: 0.025 sec.

I know this is a fairly old post, but i came across it when looking for MDP related questions, I did want to note (for folks coming in here) a few more comments about when you stated what "s" and "a" were.
I think for a you are absolutely correct it's your list of [up,down,left,right].
However for s it's really the location in the grid and s' is the location you can go to.
What that means is that you pick a state, and then you pick a particular s' and go through all the actions that can take you to that sprime, which you use to figure out those values. (pick a max out of those). Finally you go for the next s' and do the same thing, when you've exhausted all the s' values then you find the max of what you just finished searching on.
Suppose you picked a grid cell in the corner, you'd only have 2 states you could possibly move to (assuming bottom left corner), depending on how you choose to "name" your states, we could in this case assume a state is an x,y coordinate, so your current state s is 1,1 and your s' (or s prime) list is x+1,y and x,y+1 (no diagonal in this example) (The Summation part that goes over all s')
Also you don't have it listed in your equation, but the max is of a or the action that gives you the max, so first you pick the s' that gives you the max and then within that you pick the action (at least this is my understanding of the algorithm).
So if you had
x,y+1 left = 10
x,y+1 right = 5
x+1,y left = 3
x+1,y right 2
You'll pick x,y+1 as your s', but then you'll need to pick an action that is maximized which is in this case left for x,y+1. I'm not sure if there is a subtle difference between just finding the maximum number and finding the state then the maximum number though so maybe someone someday can clarify that for me.
If your movements are deterministic (meaning if you say go forward, you go forward with 100% certainty), then it's pretty easy you have one action, However if they are non deterministic, you have a say 80% certainty then you should consider the other actions which could get you there. This is the context of the slippery wheel that Jose mentioned above.
I don't want to detract what others have said, but just to give some additional information.

Optimized TSP Algorithms

I am interested in ways to improve or come up with algorithms that are able to solve the Travelling salesman problem for about n = 100 to 200 cities.
The wikipedia link I gave lists various optimizations, but it does so at a pretty high level, and I don't know how to go about actually implementing them in code.
There are industrial strength solvers out there, such as Concorde, but those are way too complex for what I want, and the classic solutions that flood the searches for TSP all present randomized algorithms or the classic backtracking or dynamic programming algorithms that only work for about 20 cities.
So, does anyone know how to implement a simple (by simple I mean that an implementation doesn't take more than 100-200 lines of code) TSP solver that works in reasonable time (a few seconds) for at least 100 cities? I am only interested in exact solutions.
You may assume that the input will be randomly generated, so I don't care for inputs that are aimed specifically at breaking a certain algorithm.

200 lines and no libraries is a tough constraint. The advanced solvers use branch and bound with the Held–Karp relaxation, and I'm not sure if even the most basic version of that would fit into 200 normal lines. Nevertheless, here's an outline.
Held Karp
One way to write TSP as an integer program is as follows (Dantzig, Fulkerson, Johnson). For all edges e, constant we denotes the length of edge e, and variable xe is 1 if edge e is on the tour and 0 otherwise. For all subsets S of vertices, ∂(S) denotes the edges connecting a vertex in S with a vertex not in S.
minimize sumedges e we xe
subject to
1. for all vertices v, sumedges e in ∂({v}) xe = 2
2. for all nonempty proper subsets S of vertices, sumedges e in ∂(S) xe ≥ 2
3. for all edges e in E, xe in {0, 1}
Condition 1 ensures that the set of edges is a collection of tours. Condition 2 ensures that there's only one. (Otherwise, let S be the set of vertices visited by one of the tours.) The Held–Karp relaxation is obtained by making this change.
3. for all edges e in E, xe in {0, 1}
3. for all edges e in E, 0 ≤ xe ≤ 1
Held–Karp is a linear program but it has an exponential number of constraints. One way to solve it is to introduce Lagrange multipliers and then do subgradient optimization. That boils down to a loop that computes a minimum spanning tree and then updates some vectors, but the details are sort of involved. Besides "Held–Karp" and "subgradient (descent|optimization)", "1-tree" is another useful search term.
(A slower alternative is to write an LP solver and introduce subtour constraints as they are violated by previous optima. This means writing an LP solver and a min-cut procedure, which is also more code, but it might extend better to more exotic TSP constraints.)
Branch and bound
By "partial solution", I mean an partial assignment of variables to 0 or 1, where an edge assigned 1 is definitely in the tour, and an edge assigned 0 is definitely out. Evaluating Held–Karp with these side constraints gives a lower bound on the optimum tour that respects the decisions already made (an extension).
Branch and bound maintains a set of partial solutions, at least one of which extends to an optimal solution. The pseudocode for one variant, depth-first search with best-first backtracking is as follows.
let h be an empty minheap of partial solutions, ordered by Held–Karp value
let bestsolsofar = null
let cursol be the partial solution with no variables assigned
loop
while cursol is not a complete solution and cursol's H–K value is at least as good as the value of bestsolsofar
choose a branching variable v
let sol0 be cursol union {v -> 0}
let sol1 be cursol union {v -> 1}
evaluate sol0 and sol1
let cursol be the better of the two; put the other in h
end while
if cursol is better than bestsolsofar then
let bestsolsofar = cursol
delete all heap nodes worse than cursol
end if
if h is empty then stop; we've found the optimal solution
pop the minimum element of h and store it in cursol
end loop
The idea of branch and bound is that there's a search tree of partial solutions. The point of solving Held–Karp is that the value of the LP is at most the length OPT of the optimal tour but also conjectured to be at least 3/4 OPT (in practice, usually closer to OPT).
The one detail in the pseudocode I've left out is how to choose the branching variable. The goal is usually to make the "hard" decisions first, so fixing a variable whose value is already near 0 or 1 is probably not wise. One option is to choose the closest to 0.5, but there are many, many others.
EDIT
Java implementation. 198 nonblank, noncomment lines. I forgot that 1-trees don't work with assigning variables to 1, so I branch by finding a vertex whose 1-tree has degree >2 and delete each edge in turn. This program accepts TSPLIB instances in EUC_2D format, e.g., eil51.tsp and eil76.tsp and eil101.tsp and lin105.tsp from http://www2.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/tsp/.
// simple exact TSP solver based on branch-and-bound/Held--Karp
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class TSP {
// number of cities
private int n;
// city locations
private double[] x;
private double[] y;
// cost matrix
private double[][] cost;
// matrix of adjusted costs
private double[][] costWithPi;
Node bestNode = new Node();
public static void main(String[] args) throws IOException {
// read the input in TSPLIB format
// assume TYPE: TSP, EDGE_WEIGHT_TYPE: EUC_2D
// no error checking
TSP tsp = new TSP();
tsp.readInput(new InputStreamReader(System.in));
tsp.solve();
}
public void readInput(Reader r) throws IOException {
BufferedReader in = new BufferedReader(r);
Pattern specification = Pattern.compile("\\s*([A-Z_]+)\\s*(:\\s*([0-9]+))?\\s*");
Pattern data = Pattern.compile("\\s*([0-9]+)\\s+([-+.0-9Ee]+)\\s+([-+.0-9Ee]+)\\s*");
String line;
while ((line = in.readLine()) != null) {
Matcher m = specification.matcher(line);
if (!m.matches()) continue;
String keyword = m.group(1);
if (keyword.equals("DIMENSION")) {
n = Integer.parseInt(m.group(3));
cost = new double[n][n];
} else if (keyword.equals("NODE_COORD_SECTION")) {
x = new double[n];
y = new double[n];
for (int k = 0; k < n; k++) {
line = in.readLine();
m = data.matcher(line);
m.matches();
int i = Integer.parseInt(m.group(1)) - 1;
x[i] = Double.parseDouble(m.group(2));
y[i] = Double.parseDouble(m.group(3));
}
// TSPLIB distances are rounded to the nearest integer to avoid the sum of square roots problem
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
double dx = x[i] - x[j];
double dy = y[i] - y[j];
cost[i][j] = Math.rint(Math.sqrt(dx * dx + dy * dy));
}
}
}
}
}
public void solve() {
bestNode.lowerBound = Double.MAX_VALUE;
Node currentNode = new Node();
currentNode.excluded = new boolean[n][n];
costWithPi = new double[n][n];
computeHeldKarp(currentNode);
PriorityQueue<Node> pq = new PriorityQueue<Node>(11, new NodeComparator());
do {
do {
boolean isTour = true;
int i = -1;
for (int j = 0; j < n; j++) {
if (currentNode.degree[j] > 2 && (i < 0 || currentNode.degree[j] < currentNode.degree[i])) i = j;
}
if (i < 0) {
if (currentNode.lowerBound < bestNode.lowerBound) {
bestNode = currentNode;
System.err.printf("%.0f", bestNode.lowerBound);
}
break;
}
System.err.printf(".");
PriorityQueue<Node> children = new PriorityQueue<Node>(11, new NodeComparator());
children.add(exclude(currentNode, i, currentNode.parent[i]));
for (int j = 0; j < n; j++) {
if (currentNode.parent[j] == i) children.add(exclude(currentNode, i, j));
}
currentNode = children.poll();
pq.addAll(children);
} while (currentNode.lowerBound < bestNode.lowerBound);
System.err.printf("%n");
currentNode = pq.poll();
} while (currentNode != null && currentNode.lowerBound < bestNode.lowerBound);
// output suitable for gnuplot
// set style data vector
System.out.printf("# %.0f%n", bestNode.lowerBound);
int j = 0;
do {
int i = bestNode.parent[j];
System.out.printf("%f\t%f\t%f\t%f%n", x[j], y[j], x[i] - x[j], y[i] - y[j]);
j = i;
} while (j != 0);
}
private Node exclude(Node node, int i, int j) {
Node child = new Node();
child.excluded = node.excluded.clone();
child.excluded[i] = node.excluded[i].clone();
child.excluded[j] = node.excluded[j].clone();
child.excluded[i][j] = true;
child.excluded[j][i] = true;
computeHeldKarp(child);
return child;
}
private void computeHeldKarp(Node node) {
node.pi = new double[n];
node.lowerBound = Double.MIN_VALUE;
node.degree = new int[n];
node.parent = new int[n];
double lambda = 0.1;
while (lambda > 1e-06) {
double previousLowerBound = node.lowerBound;
computeOneTree(node);
if (!(node.lowerBound < bestNode.lowerBound)) return;
if (!(node.lowerBound < previousLowerBound)) lambda *= 0.9;
int denom = 0;
for (int i = 1; i < n; i++) {
int d = node.degree[i] - 2;
denom += d * d;
}
if (denom == 0) return;
double t = lambda * node.lowerBound / denom;
for (int i = 1; i < n; i++) node.pi[i] += t * (node.degree[i] - 2);
}
}
private void computeOneTree(Node node) {
// compute adjusted costs
node.lowerBound = 0.0;
Arrays.fill(node.degree, 0);
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) costWithPi[i][j] = node.excluded[i][j] ? Double.MAX_VALUE : cost[i][j] + node.pi[i] + node.pi[j];
}
int firstNeighbor;
int secondNeighbor;
// find the two cheapest edges from 0
if (costWithPi[0][2] < costWithPi[0][1]) {
firstNeighbor = 2;
secondNeighbor = 1;
} else {
firstNeighbor = 1;
secondNeighbor = 2;
}
for (int j = 3; j < n; j++) {
if (costWithPi[0][j] < costWithPi[0][secondNeighbor]) {
if (costWithPi[0][j] < costWithPi[0][firstNeighbor]) {
secondNeighbor = firstNeighbor;
firstNeighbor = j;
} else {
secondNeighbor = j;
}
}
}
addEdge(node, 0, firstNeighbor);
Arrays.fill(node.parent, firstNeighbor);
node.parent[firstNeighbor] = 0;
// compute the minimum spanning tree on nodes 1..n-1
double[] minCost = costWithPi[firstNeighbor].clone();
for (int k = 2; k < n; k++) {
int i;
for (i = 1; i < n; i++) {
if (node.degree[i] == 0) break;
}
for (int j = i + 1; j < n; j++) {
if (node.degree[j] == 0 && minCost[j] < minCost[i]) i = j;
}
addEdge(node, node.parent[i], i);
for (int j = 1; j < n; j++) {
if (node.degree[j] == 0 && costWithPi[i][j] < minCost[j]) {
minCost[j] = costWithPi[i][j];
node.parent[j] = i;
}
}
}
addEdge(node, 0, secondNeighbor);
node.parent[0] = secondNeighbor;
node.lowerBound = Math.rint(node.lowerBound);
}
private void addEdge(Node node, int i, int j) {
double q = node.lowerBound;
node.lowerBound += costWithPi[i][j];
node.degree[i]++;
node.degree[j]++;
}
}
class Node {
public boolean[][] excluded;
// Held--Karp solution
public double[] pi;
public double lowerBound;
public int[] degree;
public int[] parent;
}
class NodeComparator implements Comparator<Node> {
public int compare(Node a, Node b) {
return Double.compare(a.lowerBound, b.lowerBound);
}
}

If your graph satisfy the triangle inequality and you want a guarantee of 3/2 within the optimum I suggest the christofides algorithm. I've wrote an implementation in php at phpclasses.org.

As of 2013, It is possible to solve for 100 cities using only the exact formulation in Cplex. Add degree equations for each vertex, but include subtour-avoiding constraints only as they appear. Most of them are not necessary. Cplex has an example on this.
You should be able to solve for 100 cities. You will have to iterate every time a new subtour is found. I ran an example here and in a couple of minutes and 100 iterations later I got my results.

I took Held-Karp algorithm from concorde library and 25 cities are solved in 0.15 seconds. This performance is perfectly good for me! You can extract the code (writen in ANSI C) of held-karp from concorde library: http://www.math.uwaterloo.ca/tsp/concorde/downloads/downloads.htm. If the download has the extension gz, it should be tgz. You might need to rename it. Then you should make little ajustments to port in in VC++. First take the file heldkarp h and c (rename it cpp) and other about 5 files, make adjustments and it should work calling CCheldkarp_small(...) with edgelen: euclid_ceiling_edgelen.

TSP is an NP-hard problem. (As far as we know) there is no algorithm for NP-hard problems which runs in polynomial time, so you ask for something that doesn't exist.
It's either fast enough to finish in a reasonable time and then it's not exact, or exact but won't finish in your lifetime for 100 cities.

To give a dumb answer: me too. Everyone is interrested in such algorithm, but as others already stated: I does not (yet?) exist. Esp your combination of exact, 200 nodes, few seconds runtime and just 200 lines of code is impossible. You already know that is it NP hard and if you got the slightest impression of asymptotic behaviour you should know that there is no way of achieving this (except you prove that NP=P, and even that I would say thats not possible). Even the exact commercial solvers need for such instances far more than some seconds and as you can imagine they have far more than 200 lines of code (even when you just consider their kernels).
EDIT: The wiki algorithms are the "usual suspects" of the field: Linear Programming and branch-and-bound. Their solutions for the instances with thousands of nodes took Years to solve (they just did it with very very much CPUs parallel, so they can do it faster). Some even use for the branch-and-bound problem specific knowledge for the bounding, so they are no general approaches.
Branch and bound just enumerates all possible paths (e.g. with backtracking) and applies once it has a solution this for to stop a started recursion when it can prove that the result is not better than the already found solution (e.g. if you just visited 2 of your cities and the path is already longer than a found 200 city tour. You can discard all tours that start with that 2 city combination). Here you can invest very much problem specific knowledge in the function that tells you, that the path is not going to be better than the already found solution. The better it is, the less paths you have to look at, the faster is your algorithm.
Linear Programming is an optimization method so solve linear inequality problems. It works in polynomial time (simplex just practically, but that doesnt matter here), but the solution is real. When you have the additional constraint that the solution must be integer, it gets NP-complete. For small instances it is possible, e.g. one method to solve it, then look which variable of the solution violates the integer part and add addition inequalities to change it (this is called cutting-plane, the name cames from the fact that the inequalities define (higher-dimensional) plane, the solution space is a polytop and by adding additional inequalities you cut something with a plane from the polytop). The topic is very complex and even a general simple simplex is hard to understand when you dont want dive deep into the math. There are several good books about, one of the betters is from Chvatal, Linear Programming, but there are several more.

I have a theory, but I've never had the time to pursue it:
The TSP is a bounding problem (single shape where all points lie on the perimeter) where the optimal solution is that solution that has the shortest perimeter.
There are plenty of simple ways to get all the points that lie on a minimum bounding perimeter (imagine a large elastic band stretched around a bunch of nails in a large board.)
My theory is that if you start pushing in on the elastic band so that the length of band increases by the same amount between adjacent points on the perimeter, and each segment remains in the shape of an eliptical arc, the stretched elastic will cross points on the optimal path before crossing points on non-optimal paths. See this page on mathopenref.com on drawing ellipses--particularly steps 5 and 6. Points on the bounding perimeter can be viewed as focal points of the ellipse (F1, F2) in the images below.
What I don't know is if the "bubble stretching" process needs to be reset after each new point is added, or if the existing "bubbles" continue to grow and each new point on the perimeter causes only the localized "bubble" to turn into two line segments. I'll leave that for you to figure out.

ACM Problem: Coin-Flipping, help me identify the type of problem this is

I'm practicing for the upcoming ACM programming competition in a week and I've gotten stumped on this programming problem.
The problem is as follows:
You have a puzzle consisting of a square grid of size 4. Each grid square holds a single coin; each coin is showing either heads (H) and tails (T). One such puzzle is shown here:
H H H H
T T T T
H T H T
T T H T
Any coin that is current showing Tails (T) can be flipped to Heads (H). However, any time we flip a coin, we must also flip the adjacent coins direct above, below and to the left and right in the same row. Thus if we flip the second coin in the second row we must also flip 4 other coins, giving us this arrangment (coins that changed are shown in bold).
H T H H
H H H T
H H H T
T T H T
If a coin is at the edge of the puzzle, so there is no coin on one side or the other, then we flip fewer coins. We do not "wrap around" to the other side. For example, if we flipped the bottom right coin of the arragnement above we would get:
H T H H
H H H T
H H H H
T T T H
Note: Only coins showing (T) tails can be selected for flipping. However, anytime we flip such a coin, adjacent coins are also flipped, regardless of their state.
The goal of the puzzle is to have all coins show heads. While it is possible for some arragnements to not have solutions, all the problems given will have solutions. The answer we are looking for is, for any given 4x4 grid of coins what is the least number of flips in order to make the grid entirely heads.
For Example the grid:
H T H H
T T T H
H T H T
H H T T
The answer to this grid is: 2 flips.
What I have done so far:
I'm storing our grids as two-dimensional array of booleans. Heads = true, tails = false.
I have a flip(int row, int col) method that will flip the adjacent coins according the rules above and I have a isSolved() method that will determine if the puzzle is in a solved state (all heads). So we have our "mechanics" in place.
The part we are having problems with is how should we loop through, going an the least amount of times deep?

Your puzzle is a classic Breadth-First Search candidate. This is because you're looking for a solution with the fewest possible 'moves'.
If you knew the number of moves to the goal, then that would be ideal for a Depth-First Search.
Those Wikipedia articles contain plenty of information about the way the searches work, they even contain code samples in several languages.
Either search can be recursive, if you're sure you won't run out of stack space.

EDIT: I hadn't noticed that you can't use a coin as the primary move unless it's showing tails. That does indeed make order important. I'll leave this answer here, but look into writing another one as well.
No pseudo-code here, but think about this: can you ever imagine yourself flipping a coin twice? What would be the effect?
Alternative, write down some arbitrary board (literally, write it down). Set up some real world coins, and pick two arbitrary ones, X and Y. Do an "X flip", then a "Y flip" then another "X flip". Write down the result. Now reset the board to the starting version, and just do a "Y flip". Compare the results, and think about what's happened. Try it a few times, sometimes with X and Y close together, sometimes not. Become confident in your conclusion.
That line of thought should lead you to a way of determining a finite set of possible solutions. You can test all of them fairly easily.
Hope this hint wasn't too blatant - I'll keep an eye on this question to see if you need more help. It's a nice puzzle.
As for recursion: you could use recursion. Personally, I wouldn't in this case.
EDIT: Actually, on second thoughts I probably would use recursion. It could make life a lot simpler.
Okay, perhaps that wasn't obvious enough. Let's label the coins A-P, like this:
ABCD
EFGH
IJKL
MNOP
Flipping F will always involve the following coins changing state: BEFGJ.
Flipping J will always involve the following coins changing state: FIJKN.
What happens if you flip a coin twice? The two flips cancel each other out, no matter what other flips occur.
In other words, flipping F and then J is the same as flipping J and then F. Flipping F and then J and then F again is the same as just flipping J to start with.
So any solution isn't really a path of "flip A then F then J" - it's "flip <these coins>; don't flip <these coins>". (It's unfortunate that the word "flip" is used for both the primary coin to flip and the secondary coins which change state for a particular move, but never mind - hopefully it's clear what I mean.)
Each coin will either be used as a primary move or not, 0 or 1. There are 16 coins, so 2^16 possibilities. So 0 might represent "don't do anything"; 1 might represent "just A"; 2 might represent "just B"; 3 "A and B" etc.
Test each combination. If (somehow) there's more than one solution, count the number of bits in each solution to find the least number.
Implementation hint: the "current state" can be represented as a 16 bit number as well. Using a particular coin as a primary move will always XOR the current state with a fixed number (for that coin). This makes it really easy to work out the effect of any particular combination of moves.
Okay, here's the solution in C#. It shows how many moves were required for each solution it finds, but it doesn't keep track of which moves those were, or what the least number of moves is. That's a SMOP :)
The input is a list of which coins are showing tails to start with - so for the example in the question, you'd start the program with an argument of "BEFGJLOP". Code:
using System;
public class CoinFlip
{
// All ints could really be ushorts, but ints are easier
// to work with
static readonly int[] MoveTransitions = CalculateMoveTransitions();
static int[] CalculateMoveTransitions()
{
int[] ret = new int[16];
for (int i=0; i < 16; i++)
{
int row = i / 4;
int col = i % 4;
ret[i] = PositionToBit(row, col) +
PositionToBit(row-1, col) +
PositionToBit(row+1, col) +
PositionToBit(row, col-1) +
PositionToBit(row, col+1);
}
return ret;
}
static int PositionToBit(int row, int col)
{
if (row < 0 || row > 3 || col < 0 || col > 3)
{
// Makes edge detection easier
return 0;
}
return 1 << (row * 4 + col);
}
static void Main(string[] args)
{
int initial = 0;
foreach (char c in args[0])
{
initial += 1 << (c-'A');
}
Console.WriteLine("Initial = {0}", initial);
ChangeState(initial, 0, 0);
}
static void ChangeState(int current, int nextCoin, int currentFlips)
{
// Reached the end. Success?
if (nextCoin == 16)
{
if (current == 0)
{
// More work required if we want to display the solution :)
Console.WriteLine("Found solution with {0} flips", currentFlips);
}
}
else
{
// Don't flip this coin
ChangeState(current, nextCoin+1, currentFlips);
// Or do...
ChangeState(current ^ MoveTransitions[nextCoin], nextCoin+1, currentFlips+1);
}
}
}

I would suggest a breadth first search, as someone else already mentioned.
The big secret here is to have multiple copies of the game board. Don't think of "the board."
I suggest creating a data structure that contains a representation of a board, and an ordered list of moves that got to that board from the starting position. A move is the coordinates of the center coin in a set of flips. I'll call an instance of this data structure a "state" below.
My basic algorithm would look something like this:
Create a queue.
Create a state that contains the start position and an empty list of moves.
Put this state into the queue.
Loop forever:
Pull first state off of queue.
For each coin showing tails on the board:
Create a new state by flipping that coin and the appropriate others around it.
Add the coordinates of that coin to the list of moves in the new state.
If the new state shows all heads:
Rejoice, you are done.
Push the new state into the end of the queue.
If you like, you could add a limit to the length of the queue or the length of move lists, to pick a place to give up. You could also keep track of boards that you have already seen in order to detect loops. If the queue empties and you haven't found any solutions, then none exist.
Also, a few of the comments already made seem to ignore the fact that the problem only allows coins that show tails to be in the middle of a move. This means that order very much does matter. If the first move flips a coin from heads to tails, then that coin can be the center of the second move, but it could not have been the center of the first move. Similarly, if the first move flips a coin from tails to heads, then that coin cannot be the center of the second move, even though it could have been the center of the first move.

The grid, read in row-major order, is nothing more than a 16 bit integer. Both the grid given by the problem and the 16 possible moves (or "generators") can be stored as 16 bit integers, thus the problems amounts to find the least possible number of generators which, summed by means of bitwise XOR, gives the grid itself as the result. I wonder if there's a smarter alternative than trying all the 65536 possibilities.
EDIT: Indeed there is a convenient way to do bruteforcing. You can try all the 1-move patterns, then all the 2-moves patterns, and so on. When a n-moves pattern matches the grid, you can stop, exhibit the winning pattern and say that the solution requires at least n moves. Enumeration of all the n-moves patterns is a recursive problem.
EDIT2: You can bruteforce with something along the lines of the following (probably buggy) recursive pseudocode:
// Tries all the n bit patterns with k bits set to 1
tryAllPatterns(unsigned short n, unsigned short k, unsigned short commonAddend=0)
{
if(n == 0)
tryPattern(commonAddend);
else
{
// All the patterns that have the n-th bit set to 1 and k-1 bits
// set to 1 in the remaining
tryAllPatterns(n-1, k-1, (2^(n-1) xor commonAddend) );
// All the patterns that have the n-th bit set to 0 and k bits
// set to 1 in the remaining
tryAllPatterns(n-1, k, commonAddend );
}
}

To elaborate on Federico's suggestion, the problem is about finding a set of the 16 generators that xor'ed together gives the starting position.
But if we consider each generator as a vector of integers modulo 2, this becomes finding a linear combination of vectors, that equal the starting position.
Solving this should just be a matter of gaussian elimination (mod 2).
EDIT:
After thinking a bit more, I think this would work:
Build a binary matrix G of all the generators, and let s be the starting state. We are looking for vectors x satisfying Gx=s (mod 2). After doing gaussian elimination, we either end up with such a vector x or we find that there are no solutions.
The problem is then to find the vector y such that Gy = 0 and x^y has as few bits set as possible, and I think the easiest way to find this would be to try all such y. Since they only depend on G, they can be precomputed.
I admit that a brute-force search would be a lot easier to implement, though. =)

Okay, here's an answer now that I've read the rules properly :)
It's a breadth-first search using a queue of states and the moves taken to get there. It doesn't make any attempt to prevent cycles, but you have to specify a maximum number of iterations to try, so it can't go on forever.
This implementation creates a lot of strings - an immutable linked list of moves would be neater on this front, but I don't have time for that right now.
using System;
using System.Collections.Generic;
public class CoinFlip
{
struct Position
{
readonly string moves;
readonly int state;
public Position(string moves, int state)
{
this.moves = moves;
this.state = state;
}
public string Moves { get { return moves; } }
public int State { get { return state; } }
public IEnumerable<Position> GetNextPositions()
{
for (int move = 0; move < 16; move++)
{
if ((state & (1 << move)) == 0)
{
continue; // Not allowed - it's already heads
}
int newState = state ^ MoveTransitions[move];
yield return new Position(moves + (char)(move+'A'), newState);
}
}
}
// All ints could really be ushorts, but ints are easier
// to work with
static readonly int[] MoveTransitions = CalculateMoveTransitions();
static int[] CalculateMoveTransitions()
{
int[] ret = new int[16];
for (int i=0; i < 16; i++)
{
int row = i / 4;
int col = i % 4;
ret[i] = PositionToBit(row, col) +
PositionToBit(row-1, col) +
PositionToBit(row+1, col) +
PositionToBit(row, col-1) +
PositionToBit(row, col+1);
}
return ret;
}
static int PositionToBit(int row, int col)
{
if (row < 0 || row > 3 || col < 0 || col > 3)
{
return 0;
}
return 1 << (row * 4 + col);
}
static void Main(string[] args)
{
int initial = 0;
foreach (char c in args[0])
{
initial += 1 << (c-'A');
}
int maxDepth = int.Parse(args[1]);
Queue<Position> queue = new Queue<Position>();
queue.Enqueue(new Position("", initial));
while (queue.Count != 0)
{
Position current = queue.Dequeue();
if (current.State == 0)
{
Console.WriteLine("Found solution in {0} moves: {1}",
current.Moves.Length, current.Moves);
return;
}
if (current.Moves.Length == maxDepth)
{
continue;
}
// Shame Queue<T> doesn't have EnqueueRange :(
foreach (Position nextPosition in current.GetNextPositions())
{
queue.Enqueue(nextPosition);
}
}
Console.WriteLine("No solutions");
}
}

If you are practicing for the ACM, I would consider this puzzle also for non-trivial boards, say 1000x1000. Brute force / greedy may still work, but be careful to avoid exponential blow-up.

The is the classic "Lights Out" problem. There is actually an easy O(2^N) brute force solution, where N is either the width or the height, whichever is smaller.
Let's assume the following works on the width, since you can transpose it.
One observation is that you don't need to press the same button twice - it just cancels out.
The key concept is just that you only need to determine if you want to press the button for each item on the first row. Every other button press is uniquely determined by one thing - whether the light above the considered button is on. If you're looking at cell (x,y), and cell (x,y-1) is on, there's only one way to turn it off, by pressing (x,y). Iterate through the rows from top to bottom and if there are no lights left on at the end, you have a solution there. You can then take the min of all the tries.

It's a finite state machine, where each "state" is the 16 bit integer corresponding the the value of each coin.
Each state has 16 outbound transitions, corresponding to the state after you flip each coin.
Once you've mapped out all the states and transitions, you have to find the shortest path in the graph from your beginning state to state 1111 1111 1111 1111,

I sat down and attempted my own solution to this problem (based on the help I received in this thread). I'm using a 2d array of booleans, so it isn't as nice as the people using 16bit integers with bit manipulation.
In any case, here is my solution in Java:
import java.util.*;
class Node
{
public boolean[][] Value;
public Node Parent;
public Node (boolean[][] value, Node parent)
{
this.Value = value;
this.Parent = parent;
}
}
public class CoinFlip
{
public static void main(String[] args)
{
boolean[][] startState = {{true, false, true, true},
{false, false, false, true},
{true, false, true, false},
{true, true, false, false}};
List<boolean[][]> solutionPath = search(startState);
System.out.println("Solution Depth: " + solutionPath.size());
for(int i = 0; i < solutionPath.size(); i++)
{
System.out.println("Transition " + (i+1) + ":");
print2DArray(solutionPath.get(i));
}
}
public static List<boolean[][]> search(boolean[][] startState)
{
Queue<Node> Open = new LinkedList<Node>();
Queue<Node> Closed = new LinkedList<Node>();
Node StartNode = new Node(startState, null);
Open.add(StartNode);
while(!Open.isEmpty())
{
Node nextState = Open.remove();
System.out.println("Considering: ");
print2DArray(nextState.Value);
if (isComplete(nextState.Value))
{
System.out.println("Solution Found!");
return constructPath(nextState);
}
else
{
List<Node> children = generateChildren(nextState);
Closed.add(nextState);
for(Node child : children)
{
if (!Open.contains(child))
Open.add(child);
}
}
}
return new ArrayList<boolean[][]>();
}
public static List<boolean[][]> constructPath(Node node)
{
List<boolean[][]> solutionPath = new ArrayList<boolean[][]>();
while(node.Parent != null)
{
solutionPath.add(node.Value);
node = node.Parent;
}
Collections.reverse(solutionPath);
return solutionPath;
}
public static List<Node> generateChildren(Node parent)
{
System.out.println("Generating Children...");
List<Node> children = new ArrayList<Node>();
boolean[][] coinState = parent.Value;
for(int i = 0; i < coinState.length; i++)
{
for(int j = 0; j < coinState[i].length; j++)
{
if (!coinState[i][j])
{
boolean[][] child = arrayDeepCopy(coinState);
flip(child, i, j);
children.add(new Node(child, parent));
}
}
}
return children;
}
public static boolean[][] arrayDeepCopy(boolean[][] original)
{
boolean[][] r = new boolean[original.length][original[0].length];
for(int i=0; i < original.length; i++)
for (int j=0; j < original[0].length; j++)
r[i][j] = original[i][j];
return r;
}
public static void flip(boolean[][] grid, int i, int j)
{
//System.out.println("Flip("+i+","+j+")");
// if (i,j) is on the grid, and it is tails
if ((i >= 0 && i < grid.length) && (j >= 0 && j <= grid[i].length))
{
// flip (i,j)
grid[i][j] = !grid[i][j];
// flip 1 to the right
if (i+1 >= 0 && i+1 < grid.length) grid[i+1][j] = !grid[i+1][j];
// flip 1 down
if (j+1 >= 0 && j+1 < grid[i].length) grid[i][j+1] = !grid[i][j+1];
// flip 1 to the left
if (i-1 >= 0 && i-1 < grid.length) grid[i-1][j] = !grid[i-1][j];
// flip 1 up
if (j-1 >= 0 && j-1 < grid[i].length) grid[i][j-1] = !grid[i][j-1];
}
}
public static boolean isComplete(boolean[][] coins)
{
boolean complete = true;
for(int i = 0; i < coins.length; i++)
{
for(int j = 0; j < coins[i].length; j++)
{
if (coins[i][j] == false) complete = false;
}
}
return complete;
}
public static void print2DArray(boolean[][] array)
{
for (int row=0; row < array.length; row++)
{
for (int col=0; col < array[row].length; col++)
{
System.out.print((array[row][col] ? "H" : "T") + " ");
}
System.out.println();
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio