Getting Wrong Answer in "Binary Tree Cameras" LeetCode Hard - data-structures

I get a Wrong Answer in LeetCode question 968. Binary Tree Cameras:
You are given the root of a binary tree. We install cameras on the tree nodes where each camera at a node can monitor its parent, itself, and its immediate children.
Return the minimum number of cameras needed to monitor all nodes of the tree.
My approach
I count the number of cameras required at odd or even levels and the minimum of them will be the answer. If any of the even or odd level camera's count is zero then we need at least 1 camera in case there's a node present in the tree.
My Code
class Solution
{
public:
int minCameraCover(TreeNode *root)
{
int ans = 0;
if (root)
{
// cameras at odd or even level
int odd = 0, even = 0;
bool isEvenLevel = false;
queue<TreeNode*> q;
q.push(root);
while (q.size())
{
int sz = q.size();
// adding the count of cameras required at each level
if (isEvenLevel) even += sz;
else odd += sz;
while (sz--)
{
root = q.front(), q.pop();
if (root->left) q.push(root->left);
if (root->right) q.push(root->right);
}
isEvenLevel = !isEvenLevel;
}
// we're adding minimum no. of cameras either it be on odd levels or even levels
ans = min(odd, even);
// for a single root we have to add atleast 1 camera
if (!ans) ans = max(odd, even);
}
return ans;
}
};
Test case for which my solution failed
Input: [0, 0, null, null, 0, 0, null, null, 0, 0]
My Ouput: 3
Expected Output: 2
Question
Why is my answer not correct? What have I to change without changing my approach? -- I think my approach can work.

Why is my answer not correct?
The tree for which your code gives the wrong answer, looks like this:
0
/
0
\
0
/
0
\
0
/
0
Your code will place cameras like so:
0 camera
/
0
\
0 camera
/
0
\
0 camera
/
0
But the optimal placement is:
0
/
0 camera
\
0
/
0
\
0 camera
/
0
What have I to change without changing my approach?
Your approach has these characteristics:
It assumes that the vertical distance between two cameras on the same path should be 2, but that is not true, as that generally means that the node between two such cameras is monitored by both cameras. In some cases this can be avoided, by making the vertical distance between cameras 3. This is the case in the above test case.
It assumes that the optimal solution with cameras on a particular level, will not have cameras on the levels just above and below that level. This is not generally true. Take for example the following tree:
0
/ \
1 2
/ \
3 4
/ / \
5 6 7
/ \
8 9
Your code will return 5 for this tree, but an optimal solution can do it with just 3 cameras, placed at the nodes 0, 4 and 5. An optimal solution needs to place cameras at both levels 3 and 4.
It assumes that if a camera is positioned on a certain level, that all nodes on that level need a camera. Again, this is not true, as can be seen in the example above.
In conclusion: your algorithm is cannot be tuned to work in all cases. The approach is based on assumptions that are not true. If this were the solution, the problem would not have been marked "hard". You'll have to go back to the drawing board and think of an entirely different approach.
Hint 1:
Apply a bottom up approach
Hint 2:
When going upwards through the tree, delay the placing of cameras as much as possible
Hint 3:
A node can be in three states: it has a camera, it can be monitored by a camera, it is node monitored by a camera
Hint 4:
If we know the state of the child/children of a node, we can decide what the state should be of the current node (potentially placing a camera)
Spoiler: Solution in C++
#define NOT_MONITORED 0
#define CAMERA 1
#define MONITORED 2
class Solution {
public:
int dfs(TreeNode* node, int &camCount) {
if (node == nullptr) {
return MONITORED;
}
int stateLeft = dfs(node->left, camCount);
int stateRight = dfs(node->right, camCount);
// state depends on states of children
int state = (min(stateLeft, stateRight) + 1) % 3;
if (state == CAMERA) {
camCount++;
}
return state;
}
int minCameraCover(TreeNode* root) {
int camCount = 0;
int state = dfs(root, camCount);
return camCount + (int)(state == NOT_MONITORED);
}
};

Related

A algorithm problem of complete two binary tree [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
A complete binary tree with a maximum depth of 16 is known, with all leaf nodes having the same depth. If a small ball is placed at the root node, the ball will begin to fall along the root node. There is a switch on each node in the complete binary tree. The default is all off. When the ball falls, the state of the switch changes whenever a ball falls on a switch. When the ball reaches a node, if the switch on the node is closed, go to the left to go to the ball, otherwise go to the right until it reaches the leaf node. Please help me find the leaf node number after the 12345th ball fell.
You can simulate the given problem and notice that the leaf node at which the ball ends tends to repeat itself after a point of time. For example, for a binary tree of depth 3, the leaf nodes at which the ball ends for multiple roll of the balls are 1 3 2 4 1 3 2 4 1 3 2 4 . . . (assuming the leaf nodes are numbered starting from 1). As visible, the sequence of length 23-1 = 4 keeps repeating itself. We can store this sequence in an array and answer the query for any nth ball throw by looking up the entry corresponding to the n mod 2depth-1 index in this array.
Since our depth is upto 16, the total number of operations required to generate the recurring sequence is 216-1 * 16 = 524288 operations.
Sharing the code for the same https://ideone.com/uuNV2g
#include <iostream>
#include <map>
#include <vector>
using namespace std;
map<int, bool> states; // default value is False
int MAX_DEPTH = 16;
int dfs(int cur, int depth = 0) {
if(depth == MAX_DEPTH) {
return cur - (1<<MAX_DEPTH) + 1;
}
if(states[cur] == 0) {
states[cur] = !states[cur];
return dfs(2*cur, depth+1);
}
else {
states[cur] = !states[cur];
return dfs(2*cur+1, depth+1);
}
}
int main() {
int until = (1LL<<(MAX_DEPTH-1));
vector<int> pos; // 0 indexed
for(int i = 1; i <= until; i++) {
// cout << dfs(1) << ' ';
pos.push_back(dfs(1));
}
cout << pos[(12344%until)];
// 12344 instead of 12345 since the sequence is 0 indexed
}
Hope it works out.

How to get minimum number of moves to solve `game of fifteen`?

I was reading about this and thought to form an algorithm to find the minimum number of moves to solve this.
Constraints I made: An N X N matrix having one empty slot ,say 0, would be plotted having numbers 0 to n-1.
Now we have to recreate this matrix and form the matrix having numbers in increasing order from left to right beginning from the top row and have the last element 0 i.e. (N X Nth)element.
For example,
Input :
8 4 0
7 2 5
1 3 6
Output:
1 2 3
4 5 6
7 8 0
Now the problem is how to do this in minimum number of steps possible.
As in game(link provided) you can either move left, right, up or bottom and shift the 0(empty slot) to corresponding position to make the final matrix.
The output to printed for this algorithm is number of steps say M and then Tile(number) moved in the direction say, 1 for swapping with upper adjacent element, 2 for lower adjacent element, 3 for left adjacent element and 4 for right adjacent element.
Like, for
2 <--- order of N X N matrix
3 1
0 2
Answer should be: 3 4 1 2 where 3 is M and 4 1 2 are steps to tile movement.
So I have to minimise the complexity for this algorithm and want to find minimum number of moves. Please suggest me the most efficient approach to solve this algorithm.
Edit:
What I coded in c++, Please see the algorithm rather than pointing out other issues in code .
#include <bits/stdc++.h>
using namespace std;
int inDex=0,shift[100000],N,initial[500][500],final[500][500];
struct Node
{
Node* parent;
int mat[500][500];
int x, y;
int cost;
int level;
};
Node* newNode(int mat[500][500], int x, int y, int newX,
int newY, int level, Node* parent)
{
Node* node = new Node;
node->parent = parent;
memcpy(node->mat, mat, sizeof node->mat);
swap(node->mat[x][y], node->mat[newX][newY]);
node->cost = INT_MAX;
node->level = level;
node->x = newX;
node->y = newY;
return node;
}
int row[] = { 1, 0, -1, 0 };
int col[] = { 0, -1, 0, 1 };
int calculateCost(int initial[500][500], int final[500][500])
{
int count = 0;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
if (initial[i][j] && initial[i][j] != final[i][j])
count++;
return count;
}
int isSafe(int x, int y)
{
return (x >= 0 && x < N && y >= 0 && y < N);
}
struct comp
{
bool operator()(const Node* lhs, const Node* rhs) const
{
return (lhs->cost + lhs->level) > (rhs->cost + rhs->level);
}
};
void solve(int initial[500][500], int x, int y,
int final[500][500])
{
priority_queue<Node*, std::vector<Node*>, comp> pq;
Node* root = newNode(initial, x, y, x, y, 0, NULL);
Node* prev = newNode(initial,x,y,x,y,0,NULL);
root->cost = calculateCost(initial, final);
pq.push(root);
while (!pq.empty())
{
Node* min = pq.top();
if(min->x > prev->x)
{
shift[inDex] = 4;
inDex++;
}
else if(min->x < prev->x)
{
shift[inDex] = 3;
inDex++;
}
else if(min->y > prev->y)
{
shift[inDex] = 2;
inDex++;
}
else if(min->y < prev->y)
{
shift[inDex] = 1;
inDex++;
}
prev = pq.top();
pq.pop();
if (min->cost == 0)
{
cout << min->level << endl;
return;
}
for (int i = 0; i < 4; i++)
{
if (isSafe(min->x + row[i], min->y + col[i]))
{
Node* child = newNode(min->mat, min->x,
min->y, min->x + row[i],
min->y + col[i],
min->level + 1, min);
child->cost = calculateCost(child->mat, final);
pq.push(child);
}
}
}
}
int main()
{
cin >> N;
int i,j,k=1;
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
cin >> initial[j][i];
}
}
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
final[j][i] = k;
k++;
}
}
final[N-1][N-1] = 0;
int x = 0, y = 1,a[100][100];
solve(initial, x, y, final);
for(i=0;i<inDex;i++)
{
cout << shift[i] << endl;
}
return 0;
}
In this above code I am checking for each child node which has the minimum cost(how many numbers are misplaced from the final matrix numbers).
I want to make this algorithm further efficient and reduce it's time complexity. Any suggestions would be appreciable.
While this sounds a lot like a homework problem, I'll lend a bit of help.
For significantly small problems, like your 2x2 or 3x3, you can just brute force it. Basically, you do every possible combination with every possible move, track how many turns each took, and then print out the smallest.
To improve on this, maintain a list of solved solutions, and then any time you make a possible move, if that moves already done, stop trying that one since it can't possible be the smallest.
Example, say I'm in this state (flattening your matrix to a string for ease of display):
5736291084
6753291084
5736291084
Notice that we're back to a state we've seen before. That means it can't possible be the smallest move, because the smallest would be done without returning to a previous state.
You'll want to create a tree doing this, so you'd have something like:
134
529
870
/ \
/ \
/ \
/ \
134 134
529 520
807 879
/ | \ / | \
/ | X / X \
134 134 134 134 134 130
509 529 529 502 529 524
827 087 870 879 870 879
And so on. Notice I marked some with X because they were duplicates, and thus we wouldn't want to pursue them any further since we know they can't be the smallest.
You'd just keep repeating this until you've tried all possible solutions (i.e., all non-stopped leaves reach a solution), then you just see which was the shortest. You could also do it in parallel so you stop once any one has found a solution, saving you time.
This brute force approach won't be effective against large matrices. To solve those, you're looking at some serious software engineering. One approach you could take with it would be to break it into smaller matrices and solve that way, but that may not be the best path.
This is a tricky problem to solve at larger values, and is up there with some of the trickier NP problems out there.
Start from solution, determine ranks of permuation
The reverse of above would be how you can pre-generate a list of all possible values.
Start with the solution. That has a rank of permutation of 0 (as in, zero moves):
012
345
678
Then, make all possible moves from there. All of those moves have rank of permutation of 1, as in, one move to solve.
012
0 345
678
/ \
/ \
/ \
102 312
1 345 045
678 678
Repeat that as above. Each new level all has the same rank of permutation. Generate all possible moves (in this case, until all of your branches are killed off as duplicates).
You can then store all of them into an object. Flattening the matrix would make this easy (using JavaScript syntax just for example):
{
'012345678': 0,
'102345678': 1,
'312045678': 1,
'142305678': 2,
// and so on
}
Then, to solve your question "minimum number of moves", just find the entry that is the same as your starting point. The rank of permutation is the answer.
This would be a good solution if you are in a scenario where you can pre-generate the entire solution. It would take time to generate, but lookups would be lightning fast (this is similar to "rainbow tables" for cracking hashes).
If you must solve on the fly (without pre-generation), then the first solution, start with the answer and work your way move-by-move until you find a solution would be better.
While the maximum complexity is O(n!), there are only O(n^2) possible solutions. Chopping off duplicates from the tree as you go, your complexity will be somewhere in between those two, probably in the neighborhood of O(n^3) ~ O(2^n)
You can use BFS.
Each state is one vertex, and there is an edge between two vertices if they can transfer to each other.
For example
8 4 0
7 2 5
1 3 6
and
8 0 4
7 2 5
1 3 6
are connected.
Usually, you may want to use some numbers to represent your current state. For small grid, you can just follow the sequence of the number. For example,
8 4 0
7 2 5
1 3 6
is just 840725136.
If the grid is large, you may consider using the rank of the permutation of the numbers as your representation of the state. For example,
0 1 2
3 4 5
6 7 8
should be 0, as it is the first in permutation.
And
0 1 2
3 4 5
6 7 8
(which is represented by 0)
and
1 0 2
3 4 5
6 7 8
(which is represented by some other number X)
are connected is the same as 0 and X are connected in the graph.
The complexity of the algo should be O(n!) as there are at most n! vertices/permutations.

Covering segments by points

I did search and looked at these below links but it didn't help .
Point covering problem
Segments poked (covered) with points - any tricky test cases?
Need effective greedy for covering a line segment
Problem Description:
You are given a set of segments on a line and your goal is to mark as
few points on a line as possible so that each segment contains at least
one marked point
Task.
Given a set of n segments {[a0,b0],[a1,b1]....[an-1,bn-1]} with integer
coordinates on a line, find the minimum number 'm' of points such that
each segment contains at least one point .That is, find a set of
integers X of the minimum size such that for any segment [ai,bi] there
is a point x belongs X such that ai <= x <= bi
Output Description:
Output the minimum number m of points on the first line and the integer
coordinates of m points (separated by spaces) on the second line
Sample Input - I
3
1 3
2 5
3 6
Output - I
1
3
Sample Input - II
4
4 7
1 3
2 5
5 6
Output - II
2
3 6
I didn't understand the question itself. I need the explanation, on how to solve this above problem, but i don't want the code. Examples would be greatly helpful
Maybe this formulation of the problem will be easier to understand. You have n people who can each tolerate a different range of temperatures [ai, bi]. You want to find the minimum number of rooms to make them all happy, i.e. you can set each room to a certain temperature so that each person can find a room within his/her temperature range.
As for how to solve the problem, you said you didn't want code, so I'll just roughly describe an approach. Think about the coldest room you have. If making it one degree warmer won't cause anyone to no longer be able to tolerate that room, you might as well make the increase, since that can only allow more people to use that room. So the first temperature you should set is the warmest one that the most cold-loving person can still tolerate. In other words, it should be the smallest of the bi. Now this room will satisfy some subset of your people, so you can remove them from consideration. Then repeat the process on the remaining people.
Now, to implement this efficiently, you might not want to literally do what I said above. I suggest sorting the people according to bi first, and for the ith person, try to use an existing room to satisfy them. If you can't, try to create a new one with the highest temperature possible to satisfy them, which is bi.
Yes the description is pretty vague and the only meaning that makes sense to me is this:
You got some line
Segment on a line is defined by l,r
Where one parameter is distance from start of line and second is the segments length. Which one is which is hard to tell as the letters are not very usual for such description. My bet is:
l length of segment
r distance of (start?) of segment from start of line
You want to find min set of points
So that each segment has at least one point in it. That mean for 2 overlapped segments you need just one point ...
Surely there are more option how to solve this, the obvious is genere & test with some heuristics like genere combinations only for segments that are overlapped more then once. So I would attack this task in this manner (using assumed terminology from #2):
sort segments by r
add number of overlaps to your segment set data
so the segment will be { r,l,n } and set the n=0 for all segments for now.
scan segments for overlaps
something like
for (i=0;i<segments;i++) // loop all segments
for (j=i+1;j<segments;j++) // loop all latter segments until they are still overlapped
if ( segment[i] and segment [j] are overlapped )
{
segment[i].n++; // update overlap counters
segment[j].n++;
}
else break;
Now if the r-sorted segments are overlapped then
segment[i].r <=segment[j].r
segment[i].r+segment[i].l>=segment[j].r
scan segments handling non overlapped segments
for each segment such that segment[i].n==0 add to the solution point list its point (middle) defined by distance from start of line.
points.add(segment[i].r+0.5*segment[i].l);
And after that remove segment from the list (or tag it as used or what ever you do for speed boost...).
scan segments that are overlapped just once
So if segment[i].n==1 then you need to determine if it is overlapped with i-1 or i+1. So add the mid point of the overlap to the solution points and remove i segment from list. Then decrement the n of the overlapped segment (i+1 or i-1)` and if zero remove it too.
points.add(0.5*( segment[j].r + min(segment[i].r+segment[i].l , segment[j].r+segment[j].l )));
Loop this whole scanning until there is no new point added to the solution.
now you got only multiple overlaps left
From this point I will be a bit vague for 2 reasons:
I do not have this tested and I d not have any test data to validate not to mention I am lazy.
This smells like assignment so there is some work/fun left for you.
From start I would scann all segments and remove all of them which got any point from the solution inside. This step you should perform after any changes in the solution.
Now you can experiment with generating combination of points for each overlapped group of segments and remember the minimal number of points covering all segments in group. (simply by brute force).
There are more heuristics possible like handling all twice overlapped segments (in similar manner as the single overlaps) but in the end you will have to do brute force on the rest of data ...
[edit1] as you added new info
The r,l means distance of left and right from the start of line. So if you want to convert between the other formulation { r',l' } and (l<=r) then
l=r`
r=r`+l`
and back
r`=l
l`=r-l`
Sorry too lazy to rewrite the whole thing ...
Here is the working solution in C, please refer to it partially and try to fix your code before reading the whole. Happy coding :) Spoiler alert
#include <stdio.h>
#include <stdlib.h>
int cmp_func(const void *ptr_a, const void *ptr_b)
{
const long *a = *(double **)ptr_a;
const long *b = *(double **)ptr_b;
if (a[1] == b[1])
return a[0] - b[0];
return a[1] - b[1];
}
int main()
{
int i, j, n, num_val;
long **arr;
scanf("%d", &n);
long values[n];
arr = malloc(n * sizeof(long *));
for (i = 0; i < n; ++i) {
*(arr + i) = malloc(2 * sizeof(long));
scanf("%ld %ld", &arr[i][0], &arr[i][1]);
}
qsort(arr, n, sizeof(long *), cmp_func);
i = j = 0;
num_val = 0;
while (i < n) {
int skip = 0;
values[num_val] = arr[i][1];
for (j = i + 1; j < n; ++j) {
int condition;
condition = arr[i][1] <= arr[j][1] ? arr[j][0] <= arr[i][1] : 0;
if (condition) {
skip++;
} else {
break;
}
}
num_val++;
i += skip + 1;
}
printf("%d\n", num_val);
for (int k = 0; k < num_val; ++k) {
printf("%ld ", values[k]);
}
free(arr);
return 0;
}
Here's the working code in C++ for anyone searching :)
#include <bits/stdc++.h>
#define ll long long
#define double long double
#define vi vector<int>
#define endl "\n"
#define ff first
#define ss second
#define pb push_back
#define all(x) (x).begin(),(x).end()
#define mp make_pair
using namespace std;
bool cmp(const pair<ll,ll> &a, const pair<ll,ll> &b)
{
return (a.second < b.second);
}
vector<ll> MinSig(vector<pair<ll,ll>>&vec)
{
vector<ll> points;
for(int x=0;x<vec.size()-1;)
{
bool found=false;
points.pb(vec[x].ss);
for(int y=x+1;y<vec.size();y++)
{
if(vec[y].ff>vec[x].ss)
{
x=y;
found=true;
break;
}
}
if(!found)
break;
}
return points;
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int n;
cin>>n;
vector<pair<ll,ll>>v;
for(int x=0;x<n;x++)
{
ll temp1,temp2;
cin>>temp1>>temp2;
v.pb(mp(temp1,temp2));
}
sort(v.begin(),v.end(),cmp);
vector<ll>res=MinSig(v);
cout<<res.size()<<endl;
for(auto it:res)
cout<<it<<" ";
}

Improve the solution to monkey grid puzzle

I was trying to solve the following problem:
There is a monkey which can walk around on a planar grid. The monkey
can move one space at a time left, right, up or down. That is, from
(x, y) the monkey can go to (x+1, y), (x-1, y), (x, y+1), and (x,
y-1). Points where the sum of the digits of the absolute value of the
x coordinate plus the sum of the digits of the absolute value of the y
coordinate are lesser than or equal to 19 are accessible to the
monkey. For example, the point (59, 79) is inaccessible because 5 + 9
+ 7 + 9 = 30, which is greater than 19. Another example: the point (-5, -7) is accessible because abs(-5) + abs(-7) = 5 + 7 = 12, which
is less than 19. How many points can the monkey access if it starts at
(0, 0), including (0, 0) itself?
I came up with the following brute force solution (pseudo code):
/*
legitPoints = {}; // all the allowed points that monkey can goto
list.push( Point(0,0) ); // start exploring from origin
while(!list.empty()){
Point p = list.pop_front(); // remove point
// if p has been seen before; ignore p => continue;
// else mark it and proceed further
if(legit(p){
// since we are only exploring points in one quadrant,
// we don't need to check for -x direction and -y direction
// hence explore the following: this is like Breadth First Search
list.push(Point(p.x+1, p.y)); // explore x+1, y
list.push(Point(p.x, p.y+1)); // explore x, y+1
legitPoints.insert(p); // during insertion, ignore duplicates
// (although no duplicates should come through after above check)
// count properly using multipliers
// Origin => count once x = 0 && y == 0 => mul : 1
// X axis => count twice x = 0 && y != 0 => mul : 2
// Y axis => count twice x != 0 && y = 0 => mul : 2
// All others => mul : 4
}
return legitPoints.count();
}
*/
This is a very brute force solution. One of the optimizations I used was to one scan one quadrant instead of looking at four. Another one was to ignore the points that we've already seen before.
However, looking at the final points, I was trying to find a pattern, perhaps a mathematical solution or a different approach that would be better than what I came up.
Any thoughts ?
PS: If you want, I can post the data somewhere. It is interesting to look at it with any one of the axis sorted.
First quadrant visual:
Here's what the whole grid looks like as an image:
The black squares are inaccessible, white accessible, gray accessible and reachable by movement from the center. There's a 600x600 bounding box of black because the digits of 299 add to 20, so we only have to consider that.
This exercise is basically a "flood fill", with a shape which is just about the worst case possible for a flood fill. You can do the symmetry speedup if you like, though that's not really where the meat of the issue is--my solution runs in 160 ms without it (under 50ms with it).
The big speed wins are (1) do a line-filling flood so you don't have to put every point on the stack, and (2) manage your own stack instead of doing recursion. I built my stack as two dynamically-allocated vectors of ints (for x and y), and they grow to about 16k, so building whole stack frames that deep would definitely be a huge loss.
Without looking for the ideal solution I had something similar. For each point the monkey is, I added the next 4 possibilities to a list and did the same for the next four recursively only if they had not been visited. This can be also done with multiprocessing to speed up the process.
Here is my solution, more like a BFS:
int DigitSum(int num)
{
int sum = 0;
num = (num >= 0) ? num : -num;
while(num) {
sum += num % 10;
num /= 10;
}
return sum;
}
struct Point {
int x,y;
Point(): x(0), y(0) {}
Point(int x1, int y1): x(x1), y(y1) {}
friend bool operator<(const Point& p1, const Point& p2)
{
if (p1.x < p2.x) {
return true;
} else if (p1.x == p2.x) {
return (p1.y < p2.y);
} else {
return false;
}
}
};
void neighbor(vector<Point>& n, const Point& p)
{
if (n.size() < 4) n.resize(4);
n[0] = Point(p.x-1, p.y);
n[1] = Point(p.x+1, p.y);
n[2] = Point(p.x, p.y-1);
n[3] = Point(p.x, p.y+1);
}
int numMoves(const Point& start)
{
map<Point, bool> m;
queue<Point> q;
int count = 0;
vector<Point> neigh;
q.push(start);
m[start] = true;
while (! q.empty()) {
Point c = q.front();
neighbor(neigh, c);
for (auto p: neigh) {
if ((!m[p]) && (DigitSum(p.x) + DigitSum(p.y) <= 19)) {
count++;
m[p] = true;
q.push(p);
}
}
q.pop();
}
return count;
}
I'm not sure how different this may be from brainydexter's idea... roaming the one quadrant, I instituted a single array hash (index = 299 * y + x) and built the result with another array, each index storing only the points that expand from its previous index, for example:
first iteration, result = [[(0,0)]]
second iteration, result = [[(0,0)],[(0,1),(1,0)]]
...
On an old IBM Thinkpad in JavaScript, the speed seemed to vary from 35-120 milliseconds (fiddle here).

Markov Decision Process: value iteration, how does it work?

I've been reading a lot about Markov Decision Processes (using value iteration) lately but I simply can't get my head around them. I've found a lot of resources on the Internet / books, but they all use mathematical formulas that are way too complex for my competencies.
Since this is my first year at college, I've found that the explanations and formulas provided on the web use notions / terms that are way too complicated for me and they assume that the reader knows certain things that I've simply never heard of.
I want to use it on a 2D grid (filled with walls(unattainable), coins(desirable) and enemies that move(which must be avoided at all costs)). The whole goal is to collect all the coins without touching the enemies, and I want to create an AI for the main player using a Markov Decision Process (MDP). Here is how it partially looks like (note that the game-related aspect is not so much of a concern here. I just really want to understand MDPs in general):
From what I understand, a rude simplification of MDPs is that they can create a grid which holds in which direction we need to go (kind of a grid of "arrows" pointing where we need to go, starting at a certain position on the grid) to get to certain goals and avoid certain obstacles. Specific to my situation, that would mean that it allows the player to know in which direction to go to collect the coins and avoid the enemies.
Now, using the MDP terms, it would mean that it creates a collection of states(the grid) which holds certain policies(the action to take -> up, down, right, left) for a certain state(a position on the grid). The policies are determined by the "utility" values of each state, which themselves are calculated by evaluating how much getting there would be beneficial in the short and long term.
Is this correct? Or am I completely on the wrong track?
I'd at least like to know what the variables from the following equation represent in my situation:
(taken from the book "Artificial Intelligence - A Modern Approach" from Russell & Norvig)
I know that s would be a list of all the squares from the grid, a would be a specific action (up / down / right / left), but what about the rest?
How would the reward and utility functions be implemented?
It would be really great if someone knew a simple link which shows pseudo-code to implement a basic version with similarities to my situation in a very slow way, because I don't even know where to start here.
Thank you for your precious time.
(Note: feel free to add / remove tags or tell me in the comments if I should give more details about something or anything like that.)
Yes, the mathematical notation can make it seem much more complicated than it is. Really, it is a very simple idea. I have a implemented a value iteration demo applet that you can play with to get a better idea.
Basically, lets say you have a 2D grid with a robot in it. The robot can try to move North, South, East, West (those are the actions a) but, because its left wheel is slippery, when it tries to move North there is only a .9 probability that it will end up at the square North of it while there is a .1 probability that it will end up at the square West of it (similarly for the other 3 actions). These probabilities are captured by the T() function. Namely, T(s,A,s') will look like:
s A s' T //x=0,y=0 is at the top-left of the screen
x,y North x,y+1 .9 //we do move north
x,y North x-1,y .1 //wheels slipped, so we move West
x,y East x+1,y .9
x,y East x,y-1 .1
x,y South x,y+1 .9
x,y South x-1,y .1
x,y West x-1,y .9
x,y West x,y+1 .1
You then set the Reward to be 0 for all states, but 100 for the goal state, that is, the location you want the robot to get to.
What value-iteration does is its starts by giving a Utility of 100 to the goal state and 0 to all the other states. Then on the first iteration this 100 of utility gets distributed back 1-step from the goal, so all states that can get to the goal state in 1 step (all 4 squares right next to it) will get some utility. Namely, they will get a Utility equal to the probability that from that state we can get to the goal stated. We then continue iterating, at each step we move the utility back 1 more step away from the goal.
In the example above, say you start with R(5,5)= 100 and R(.) = 0 for all other states. So the goal is to get to 5,5.
On the first iteration we set
R(5,6) = gamma * (.9 * 100) + gamma * (.1 * 100)
because on 5,6 if you go North there is a .9 probability of ending up at 5,5, while if you go West there is a .1 probability of ending up at 5,5.
Similarly for (5,4), (4,5), (6,5).
All other states remain with U = 0 after the first iteration of value iteration.
Not a complete answer, but a clarifying remark.
The state is not a single cell. The state contains the information what is in each cell for all concerned cells at once. This means one state element contains the information which cells are solid and which are empty; which ones contain monsters; where are coins; where is the player.
Maybe you could use a map from each cell to its content as state. This does ignore the movement of monsters and player, which are probably very important, too.
The details depend on how you want to model your problem (deciding what belongs to the state and in which form).
Then a policy maps each state to an action like left, right, jump, etc.
First you must understand the problem that is expressed by a MDP before thinking about how algorithms like value iteration work.
I would recommend using Q-learning for your implementation.
Maybe you can use this post I wrote as an inspiration. This is a Q-learning demo with Java source code. This demo is a map with 6 fields and the AI learns where it should go from every state to get to the reward.
Q-learning is a technique for letting the AI learn by itself by giving it reward or punishment.
This example shows the Q-learning used for path finding. A robot learns where it should go from any state.
The robot starts at a random place, it keeps memory of the score while it explores the area, whenever it reaches the goal, we repeat with a new random start. After enough repetitions the score values will be stationary (convergence).
In this example the action outcome is deterministic (transition probability is 1) and the action selection is random. The score values are calculated by the Q-learning algorithm Q(s,a).
The image shows the states (A,B,C,D,E,F), possible actions from the states and the reward given.
Result Q*(s,a)
Policy Π*(s)
Qlearning.java
import java.text.DecimalFormat;
import java.util.Random;
/**
* #author Kunuk Nykjaer
*/
public class Qlearning {
final DecimalFormat df = new DecimalFormat("#.##");
// path finding
final double alpha = 0.1;
final double gamma = 0.9;
// states A,B,C,D,E,F
// e.g. from A we can go to B or D
// from C we can only go to C
// C is goal state, reward 100 when B->C or F->C
//
// _______
// |A|B|C|
// |_____|
// |D|E|F|
// |_____|
//
final int stateA = 0;
final int stateB = 1;
final int stateC = 2;
final int stateD = 3;
final int stateE = 4;
final int stateF = 5;
final int statesCount = 6;
final int[] states = new int[]{stateA,stateB,stateC,stateD,stateE,stateF};
// http://en.wikipedia.org/wiki/Q-learning
// http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning.htm
// Q(s,a)= Q(s,a) + alpha * (R(s,a) + gamma * Max(next state, all actions) - Q(s,a))
int[][] R = new int[statesCount][statesCount]; // reward lookup
double[][] Q = new double[statesCount][statesCount]; // Q learning
int[] actionsFromA = new int[] { stateB, stateD };
int[] actionsFromB = new int[] { stateA, stateC, stateE };
int[] actionsFromC = new int[] { stateC };
int[] actionsFromD = new int[] { stateA, stateE };
int[] actionsFromE = new int[] { stateB, stateD, stateF };
int[] actionsFromF = new int[] { stateC, stateE };
int[][] actions = new int[][] { actionsFromA, actionsFromB, actionsFromC,
actionsFromD, actionsFromE, actionsFromF };
String[] stateNames = new String[] { "A", "B", "C", "D", "E", "F" };
public Qlearning() {
init();
}
public void init() {
R[stateB][stateC] = 100; // from b to c
R[stateF][stateC] = 100; // from f to c
}
public static void main(String[] args) {
long BEGIN = System.currentTimeMillis();
Qlearning obj = new Qlearning();
obj.run();
obj.printResult();
obj.showPolicy();
long END = System.currentTimeMillis();
System.out.println("Time: " + (END - BEGIN) / 1000.0 + " sec.");
}
void run() {
/*
1. Set parameter , and environment reward matrix R
2. Initialize matrix Q as zero matrix
3. For each episode: Select random initial state
Do while not reach goal state o
Select one among all possible actions for the current state o
Using this possible action, consider to go to the next state o
Get maximum Q value of this next state based on all possible actions o
Compute o Set the next state as the current state
*/
// For each episode
Random rand = new Random();
for (int i = 0; i < 1000; i++) { // train episodes
// Select random initial state
int state = rand.nextInt(statesCount);
while (state != stateC) // goal state
{
// Select one among all possible actions for the current state
int[] actionsFromState = actions[state];
// Selection strategy is random in this example
int index = rand.nextInt(actionsFromState.length);
int action = actionsFromState[index];
// Action outcome is set to deterministic in this example
// Transition probability is 1
int nextState = action; // data structure
// Using this possible action, consider to go to the next state
double q = Q(state, action);
double maxQ = maxQ(nextState);
int r = R(state, action);
double value = q + alpha * (r + gamma * maxQ - q);
setQ(state, action, value);
// Set the next state as the current state
state = nextState;
}
}
}
double maxQ(int s) {
int[] actionsFromState = actions[s];
double maxValue = Double.MIN_VALUE;
for (int i = 0; i < actionsFromState.length; i++) {
int nextState = actionsFromState[i];
double value = Q[s][nextState];
if (value > maxValue)
maxValue = value;
}
return maxValue;
}
// get policy from state
int policy(int state) {
int[] actionsFromState = actions[state];
double maxValue = Double.MIN_VALUE;
int policyGotoState = state; // default goto self if not found
for (int i = 0; i < actionsFromState.length; i++) {
int nextState = actionsFromState[i];
double value = Q[state][nextState];
if (value > maxValue) {
maxValue = value;
policyGotoState = nextState;
}
}
return policyGotoState;
}
double Q(int s, int a) {
return Q[s][a];
}
void setQ(int s, int a, double value) {
Q[s][a] = value;
}
int R(int s, int a) {
return R[s][a];
}
void printResult() {
System.out.println("Print result");
for (int i = 0; i < Q.length; i++) {
System.out.print("out from " + stateNames[i] + ": ");
for (int j = 0; j < Q[i].length; j++) {
System.out.print(df.format(Q[i][j]) + " ");
}
System.out.println();
}
}
// policy is maxQ(states)
void showPolicy() {
System.out.println("\nshowPolicy");
for (int i = 0; i < states.length; i++) {
int from = states[i];
int to = policy(from);
System.out.println("from "+stateNames[from]+" goto "+stateNames[to]);
}
}
}
Print result
out from A: 0 90 0 72,9 0 0
out from B: 81 0 100 0 81 0
out from C: 0 0 0 0 0 0
out from D: 81 0 0 0 81 0
out from E: 0 90 0 72,9 0 90
out from F: 0 0 100 0 81 0
showPolicy
from a goto B
from b goto C
from c goto C
from d goto A
from e goto B
from f goto C
Time: 0.025 sec.
I know this is a fairly old post, but i came across it when looking for MDP related questions, I did want to note (for folks coming in here) a few more comments about when you stated what "s" and "a" were.
I think for a you are absolutely correct it's your list of [up,down,left,right].
However for s it's really the location in the grid and s' is the location you can go to.
What that means is that you pick a state, and then you pick a particular s' and go through all the actions that can take you to that sprime, which you use to figure out those values. (pick a max out of those). Finally you go for the next s' and do the same thing, when you've exhausted all the s' values then you find the max of what you just finished searching on.
Suppose you picked a grid cell in the corner, you'd only have 2 states you could possibly move to (assuming bottom left corner), depending on how you choose to "name" your states, we could in this case assume a state is an x,y coordinate, so your current state s is 1,1 and your s' (or s prime) list is x+1,y and x,y+1 (no diagonal in this example) (The Summation part that goes over all s')
Also you don't have it listed in your equation, but the max is of a or the action that gives you the max, so first you pick the s' that gives you the max and then within that you pick the action (at least this is my understanding of the algorithm).
So if you had
x,y+1 left = 10
x,y+1 right = 5
x+1,y left = 3
x+1,y right 2
You'll pick x,y+1 as your s', but then you'll need to pick an action that is maximized which is in this case left for x,y+1. I'm not sure if there is a subtle difference between just finding the maximum number and finding the state then the maximum number though so maybe someone someday can clarify that for me.
If your movements are deterministic (meaning if you say go forward, you go forward with 100% certainty), then it's pretty easy you have one action, However if they are non deterministic, you have a say 80% certainty then you should consider the other actions which could get you there. This is the context of the slippery wheel that Jose mentioned above.
I don't want to detract what others have said, but just to give some additional information.

Resources