N-sided die MDP problem Value Iteration Solution Needed

N-sided die MDP problem Value Iteration Solution Needed - algorithm

I'm working on a problem for one of my classes. The problem is this: a person starts with $0 and rolls an N-sided dice (N could range from 1 to 30) and wins money according to the dice side they roll. X sides (ones) of the N-sided die result in the person losing all their money (current balance) and the game ending; for instance, if the die is [0,0,0,1,1,1,1], a person would receive $1 if they roll 1, $2 if they roll 2, or $3 if they roll 3, but they would lose everything if they rolled 4,5,6, OR 7.
What is the expected value for this N-sided dice problem? I tried value iteration but can't seem to get it right.
So for this dice [1,1,1,0,0,0,0], our first state (1 roll) expected value is 1/7*(4)+1/7*(5)+1/7*(6)+1/7*(7) = 3.1428
For the value iteration, next we have to calculate the value of state 4 (balance=$4), state 5 (balance=$5), state 6 (balance=$6), state 7 (balance=$7)
V(s) = Max_actions [Sum_probabilities[R(s)+V(s']]
V(4) = Max($4 {quit the game}, 1/7*(4+4)+1/7*(4+5)+1/7*(4+6)+1/7*(4+7) {keep playing}) -> 5.428
V(5) = Max($5 {quit the game}, 1/7*(5+4)+1/7*(5+5)+1/7*(5+6)+1/7*(5+7){keep playing}) -> 6
V(6) = Max($6 {quit the game}, 1/7*(6+4)+1/7*(6+5)+1/7*(6+6)+1/7*(6+7){keep playing}) -> 6.57
V(7) = Max($7 {quit the game}, 1/7*(7+4)+1/7*(7+5)+1/7*(7+6)+1/7*(7+7){keep playing}) -> 7.14
Now these V(4), V(5), V(6), and V(7)'s will branch out to their next states. So V(4) will become V(8), V(9), V(10), V(11), so on and so forth.
V(8) ($8 current< $7.74 expected), V(9) ($9 current <$8.28 expected), V(10)($10 current < $8.85 expected), V(11)($11 current<$9.42 expected), V(12)($12 current<$10 expected), V(13)($11 current<$10.57 expected), V(14)($14 current <$11.14 expected).
So, that suggests that V(8), V(9), V(10), V(11), V(12), V(13), V(14) are terminal states --> V(4), V(5), V(6), V(7) do not need to be changed.
Finally, we re-calculate the value of V(0) because the values of V(4), V(5), V(6), and V(7) were changed --> V(0) = 1/7* V(4)+1/7* V(5)+1/7* V(6)+1/7* V(7) => 3.59 ... This is the final expected reward for this game.
Does this make sense? I'm not looking for code to solve the problem, just some advice on whether this approach is correct.
Thanks
Edited based on comments below to make the post more concise.

Yes, your approach is complex, but essentially correct. The expected winnings are 3 + 29/49 = 3.591836734693877551...
In general keep rolling if expected winnings exceed expected losses.
Expected losses if you have y money are y * X / N.
Expected winnings are avg(value of dice roll).
I would suggest using a dynamic programming approach for efficiency.

Related

Running a function multiple times and tracking results of the fight simulation

Ive made a function to run a fight simulation. Its got a random element so would like to run it 100 times to check results.
Ive learnt that ruby cant have functions inside functions.
$p1_skill = 10
$p1_health = 10
$p2_skill = 10
$p2_health = 10
def hp_check
if $p2_health >= 1 && $p1_health == 0
return "p2_wins"
elsif $p1_health >= 1 && $p2_health == 0
return "p1_wins"
else
battle
end
end
def battle
p1_fight = $p1_skill + rand(2..12)
p2_fight = $p2_skill + rand(2..12)
if p1_fight > p2_fight
$p2_health -= 2
hp_check
elsif p2_fight > p1_fight
$p1_health -= 2
hp_check
else
battle
end
end
battle
Right now this accurately produces a winner. It rolls two dice and adds them to a players skill. If its higher than the other players the other player loses 2 health.
The skills and hp of players will change throughout the game, this is for a project assignment.
Id like this to produce odds for win chances for balancing issues.

I have several suggestions regarding your implementation. Note that since this is a homework I'm providing the answer in pieces rather than just giving you an entire program. In no particular order...
Don't use global variables. I suspect this is the major hurdle you're running into with trying to achieve multiple runs of your model. The model state should be contained within the model methods, and initial state can be passed to it as arguments. Example:
def battle(p1_skill, p1_health, p2_skill, p2_health)
Unless your instructor has mandated that you use recursion, a simple loop structure will serve you much better. There's no need to check who won until one player or the other drops down to zero (or lower). There's also no need for an else to recursively call battle, the loop will iterate to the next round of the fight if both are still in the running, even if neither player took a hit.
while p1_health > 0 && p2_health > 0
# roll the dice and update health
end
# check who won and return that answer
hp_check really isn't needed, when you lose the recursive calls it becomes a one-liner if you perform the check after breaking out of the loop. Also, it would be more useful to return just the winner, so whoever gets that return value can decide whether they want to print it, use it to update a tally, both, or something else entirely. After you break out of the loop outlined above:
# determine which player won, since somebody's health dropped to 0 or less
p1_health > 0 ? 1 : 2
When you're incrementing or decrementing a quantity, don't do equality testing. p1_health <= 0 is much safer than p1_health == 0, because some day you or somebody else is going to start from an odd number while decrementing by 2's, or decrement by some other (random?) amount.
Generating a number uniformly between 2 and 12 is not the same as summing two 6-sided dice. There are 36 possible outcomes for the two dice. Only one of the 36 yields a 2, only one yields a 12, and at the other extreme, there are six ways to get a sum of 7. I created a little die-roll method which takes the number of dice as an argument:
def roll_dice(n)
n.times.inject(0) { |total| total + rand(1..6) }
end
so, for example, determining player 1's fight score becomes p1_fight = p1_skill + roll_dice(2).
After making these sorts of changes, tallying up the statistics is pretty straightforward:
n = 10000
number_of_p1_wins = 0
n.times { number_of_p1_wins += 1 if battle(10, 10, 10, 10) == 1 }
proportion = number_of_p1_wins.to_f / n
puts "p1 won #{"%5.2f" % (100.0 * proportion)}% of the time"
If you replace the constant 10's in the call to battle by getting user input or iterating over ranges, you can explore a rich set of other scenarios.

Minimum number of train station stops

I received this interview question and got stuck on it:
There are an infinite number of train stops starting from station number 0.
There are an infinite number of trains. The nth train stops at all of the k * 2^(n - 1) stops where k is between 0 and infinity.
When n = 1, the first train stops at stops 0, 1, 2, 3, 4, 5, 6, etc.
When n = 2, the second train stops at stops 0, 2, 4, 6, 8, etc.
When n = 3, the third train stops at stops 0, 4, 8, 12, etc.
Given a start station number and end station number, return the minimum number of stops between them. You can use any of the trains to get from one stop to another stop.
For example, the minimum number of stops between start = 1 and end = 4 is 3 because we can get from 1 to 2 to 4.
I'm thinking about a dynamic programming solution that would store in dp[start][end] the minimum number of steps between start and end. We'd build up the array using start...mid1, mid1...mid2, mid2...mid3, ..., midn...end. But I wasn't able to get it to work. How do you solve this?
Clarifications:
Trains can only move forward from a lower number stop to a higher number stop.
A train can start at any station where it makes a stop at.
Trains can be boarded in any order. The n = 1 train can be boarded before or after boarding the n = 3 train.
Trains can be boarded multiple times. For example, it is permitted to board the n = 1 train, next board the n = 2 train, and finally board the n = 1 train again.

I don't think you need dynamic programming at all for this problem. It can basically be expressed by binary calculations.
If you convert the number of a station to binary it tells you right away how to get there from station 0, e.g.,
station 6 = 110
tells you that you need to take the n=3 train and the n=2 train each for one station. So the popcount of the binary representation tells you how many steps you need.
The next step is to figure out how to get from one station to another.
I´ll show this again by example. Say you want to get from station 7 to station 23.
station 7 = 00111
station 23 = 10111
The first thing you want to do is to get to an intermediate stop. This stop is specified by
(highest bits that are equal in start and end station) + (first different bit) + (filled up with zeros)
In our example the intermediate stop is 16 (10000). The steps you need to make can be calculated by the difference of that number and the start station (7 = 00111). In our example this yields
10000 - 00111 = 1001
Now you know, that you need 2 stops (n=1 train and n=4) to get from 7 to 16.
The remaining task is to get from 16 to 23, again this can be solved by the corresponding difference
10111 - 10000 = 00111
So, you need another 3 stops to go from 16 to 23 (n= 3, n= 2, n= 1). This gives you 5 stops in total, just using two binary differences and the popcount. The resulting path can be extracted from the bit representations 7 -> 8 -> 16 -> 20 -> 22 -> 23
Edit:
For further clarification of the intermediate stop let's assume we want to go from
station 5 = 101 to
station 7 = 111
the intermediate stop in this case will be 110, because
highest bits that are equal in start and end station = 1
first different bit = 1
filled up with zeros = 0
we need one step to go there (110 - 101 = 001) and one more to go from there to the end station (111 - 110 = 001).
About the intermediate stop
The concept of the intermediate stop is a bit clunky but I could not find a more elegant way in order to get the bit operations to work. The intermediate stop is the stop in between start and end where the highest level bit switches (that's why it is constructed the way it is). In this respect it is the stop at which the fastest train (between start and end) operates (actually all trains that you are able to catch stop there).
By subtracting the intermediate stop (bit representation) from the end station (bit representation) you reduce the problem to the simple case starting from station 0 (cf. first example of my answer).
By subtracting the start station from the intermediate stop you also reduce the problem to the simple case, but assume that you go from the intermediate stop to the start station which is equivalent to the other way round.

First, ask if you can go backward. It sounds like you can't, but as presented here (which may not reflect the question as you received it), the problem never gives an explicit direction for any of these trains. (I see you've now edited your question to say you can't go backward.)
Assuming you can't go backward, the strategy is simple: always take the highest-numbered available train that doesn't overshoot your destination.
Suppose you're at stop s, and the highest-numbered train that stops at your current location and doesn't overshoot is train k. Traveling once on train k will take you to stop s + 2^(k-1). There is no faster way to get to that stop, and no way to skip that stop - no lower-numbered trains skip any of train k's stops, and no higher-numbered trains stop between train k's stops, so you can't get on a higher-numbered train before you get there. Thus, train k is your best immediate move.
With this strategy in mind, most of the remaining optimization is a matter of efficient bit twiddling tricks to compute the number of stops without explicitly figuring out every stop on the route.

I will attempt to prove my algorithm is optimal.
The algorithm is "take the fastest train that doesn't overshoot your destination".
How many stops this is is a bit tricky.
Encode both stops as binary numbers. I claim that an identical prefix can be neglected; the problem of going from a to b is the same as the problem of going from a+2^n to b+2^n if 2^n > b, as the stops between 2^n and 2^(n+1) are just the stops between 0 and 2^n shifted over.
From this, we can reduce a trip from a to b to guarantee that the high bit of b is set, and the same "high" bit of a is not set.
To solve going from 5 (101) to 7 (111), we merely have to solve going from 1 (01) to 3 (11), then shift our stop numbers up 4 (100).
To go from x to 2^n + y, where y < 2^n (and hence x is), we first want to go to 2^n, because there are no trains that skip over 2^n that do not also skip over 2^n+y < 2^{n+1}.
So any set of stops between x and y must stop at 2^n.
Thus the optimal number of stops from x to 2^n + y is the number of stops from x to 2^n, followed by the number of stops from 2^n to 2^n+y, inclusive (or from 0 to y, which is the same).
The algorithm I propose to get from 0 to y is to start with the high order bit set, and take the train that gets you there, then go on down the list.
Claim: In order to generate a number with k 1s, you must take at least k trains. As proof, if you take a train and it doesn't cause a carry in your stop number, it sets 1 bit. If you take a train and it does cause a carry, the resulting number has at most 1 more set bit than it started with.
To get from x to 2^n is a bit trickier, but can be made simple by tracking the trains you take backwards.
Mapping s_i to s_{2^n-i} and reversing the train steps, any solution for getting from x to 2^n describes a solution for getting from 0 to 2^n-x. And any solution that is optimal for the forward one is optimal for the backward one, and vice versa.
Using the result for getting from 0 to y, we then get that the optimal route from a to b where b highest bit set is 2^n and a does not have that bit set is #b-2^n + #2^n-a, where # means "the number of bits set in the binary representation". And in general, if a and b have a common prefix, simply drop that common prefix.
A local rule that generates the above number of steps is "take the fastest train in your current location that doesn't overshoot your destination".
For the part going from 2^n to 2^n+y we did that explicitly in our proof above. For the part going from x to 2^n this is trickier to see.
First, if the low order bit of x is set, obviously we have to take the first and only train we can take.
Second, imagine x has some collection of unset low-order bits, say m of them. If we played the train game going from x/2^m to 2^(n-m), then scaled the stop numbers by multiplying by 2^m we'd get a solution to going from x to 2^n.
And #(2^n-x)/2^m = #2^n - x. So this "scaled" solution is optimal.
From this, we are always taking the train corresponding to our low-order set bit in this optimal solution. This is the longest range train available, and it doesn't overshoot 2^n.
QED

This problem doesn't require dynamic programming.
Here is a simple implementation of a solution using GCC:
uint32_t min_stops(uint32_t start, uint32_t end)
{
uint32_t stops = 0;
if(start != 0) {
while(start <= end - (1U << __builtin_ctz(start))) {
start += 1U << __builtin_ctz(start);
++stops;
}
}
stops += __builtin_popcount(end ^ start);
return stops;
}
The train schema is a map of powers-of-two. If you visualize the train lines as a bit representation, you can see that the lowest bit set represents the train line with the longest distance between stops that you can take. You can also take the lines with shorter distances.
To minimize the distance, you want to take the line with the longest distance possible, until that would make the end station unreachable. That's what adding by the lowest-set bit in the code does. Once you do this, some number of the upper bits will agree with the upper bits of the end station, while the lower bits will be zero.
At that point, it's simply a a matter of taking a train for the highest bit in the end station that is not set in the current station. This is optimized as __builtin_popcount in the code.
An example going from 5 to 39:
000101 5 // Start
000110 5+1=6
001000 6+2=8
010000 8+8=16
100000 16+16=32 // 32+32 > 39, so start reversing the process
100100 32+4=36 // Optimized with __builtin_popcount in code
100110 36+2=38 // Optimized with __builtin_popcount in code
100111 38+1=39 // Optimized with __builtin_popcount in code

As some have pointed out, since stops are all multiples of powers of 2, trains that stop more frequently also stop at the same stops of the more-express trains. Any stop is on the first train's route, which stops at every station. Any stop is at most 1 unit away from the second train's route, stopping every second station. Any stop is at most 3 units from the third train that stops every fourth station, and so on.
So start at the end and trace your route back in time - hop on the nearest multiple-of-power-of-2 train and keep switching to the highest multiple-of-power-of-2 train you can as soon as possible (check the position of the least significant set bit - why? multiples of powers of 2 can be divided by two, that is bit-shifted right, without leaving a remainder, log 2 times, or as many leading zeros in the bit-representation), as long as its interval wouldn't miss the starting point after one stop. When the latter is the case, perform the reverse switch, hopping on the next lower multiple-of-power-of-2 train and stay on it until its interval wouldn't miss the starting point after one stop, and so on.

We can figure this out doing nothing but a little counting and array manipulation. Like all the previous answers, we need to start by converting both numbers to binary and padding them to the same length. So 12 and 38 become 01100 and 10110.
Looking at station 12, looking at the least significant set bit (in this case the only bit, 2^2) all trains with intervals larger than 2^2 won't stop at station 4, and all with intervals less than or equal to 2^2 will stop at station 4, but will require multiple stops to get to the same destination as the interval 4 train. We in every situation, up until we reach the largest set bit in the end value, we need to take the train with the interval of the least significant bit of the current station.
If we are at station 0010110100, our sequence will be:
0010110100 2^2
0010111000 2^3
0011000000 2^6
0100000000 2^7
1000000000
Here we can eliminate all bits smaller than the lest significant set bit and get the same count.
00101101 2^0
00101110 2^1
00110000 2^4
01000000 2^6
10000000
Trimming the ends at each stage, we get this:
00101101 2^0
0010111 2^0
0011 2^0
01 2^0
1
This could equally be described as the process of flipping all the 0 bits. Which brings us to the first half of the algorithm: Count the unset bits in the zero padded start number greater than the least significant set bit, or 1 if the start station is 0.
This will get us to the only intermediate station reachable by the train with the largest interval smaller than the end station, so all trains after this must be smaller than the previous train.
Now we need to get from station to 100101, it is easier and obvious, take the train with an interval equal to the largest significant bit set in the destination and not set in the current station number.
1000000000 2^7
1010000000 2^5
1010100000 2^4
1010110000 2^2
1010110100
Similar to the first method, we can trim the most significant bit which will always be set, then count the remaining 1's in the answer. So the second part of the algorithm is Count all the set significant bits smaller than the most significant bit
Then Add the result from parts 1 and 2
Adjusting the algorithm slightly to get all the train intervals, here is an example written in javascript so it can be run here.
function calculateStops(start, end) {
var result = {
start: start,
end: end,
count: 0,
trains: [],
reverse: false
};
// If equal there are 0 stops
if (start === end) return result;
// If start is greater than end, reverse the values and
// add note to reverse the results
if (start > end) {
start = result.end;
end = result.start;
result.reverse = true;
}
// Convert start and end values to array of binary bits
// with the exponent matched to the index of the array
start = (start >>> 0).toString(2).split('').reverse();
end = (end >>> 0).toString(2).split('').reverse();
// We can trim off any matching significant digits
// The stop pattern for 10 to 13 is the same as
// the stop pattern for 2 to 5 offset by 8
while (start[end.length-1] === end[end.length-1]) {
start.pop();
end.pop();
}
// Trim off the most sigificant bit of the end,
// we don't need it
end.pop();
// Front fill zeros on the starting value
// to make the counting easier
while (start.length < end.length) {
start.push('0');
}
// We can break the algorithm in half
// getting from the start value to the form
// 10...0 with only 1 bit set and then getting
// from that point to the end.
var index;
var trains = [];
var expected = '1';
// Now we loop through the digits on the end
// any 1 we find can be added to a temporary array
for (index in end) {
if (end[index] === expected){
result.count++;
trains.push(Math.pow(2, index));
};
}
// if the start value is 0, we can get to the
// intermediate step in one trip, so we can
// just set this to 1, checking both start and
// end because they can be reversed
if (result.start == 0 || result.end == 0) {
index++
result.count++;
result.trains.push(Math.pow(2, index));
// We need to find the first '1' digit, then all
// subsequent 0 digits, as these are the ones we
// need to flip
} else {
for (index in start) {
if (start[index] === expected){
result.count++;
result.trains.push(Math.pow(2, index));
expected = '0';
}
}
}
// add the second set to the first set, reversing
// it to get them in the right order.
result.trains = result.trains.concat(trains.reverse());
// Reverse the stop list if the trip is reversed
if (result.reverse) result.trains = result.trains.reverse();
return result;
}
$(document).ready(function () {
$("#submit").click(function () {
var trains = calculateStops(
parseInt($("#start").val()),
parseInt($("#end").val())
);
$("#out").html(trains.count);
var current = trains.start;
var stopDetails = 'Starting at station ' + current + '<br/>';
for (index in trains.trains) {
current = trains.reverse ? current - trains.trains[index] : current + trains.trains[index];
stopDetails = stopDetails + 'Take train with interval ' + trains.trains[index] + ' to station ' + current + '<br/>';
}
$("#stops").html(stopDetails);
});
});
label {
display: inline-block;
width: 50px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<label>Start</label> <input id="start" type="number" /> <br>
<label>End</label> <input id="end" type="number" /> <br>
<button id="submit">Submit</button>
<p>Shortest route contains <span id="out">0</span> stops</p>
<p id="stops"></p>

Simple Java solution
public static int minimumNumberOfStops(int start, final int end) {
// I would initialize it with 0 but the example given in the question states :
// the minimum number of stops between start = 1 and end = 4 is 3 because we can get from 1 to 2 to 4
int stops = 1;
while (start < end) {
start += findClosestPowerOfTwoLessOrEqualThan(end - start);
stops++;
}
return stops;
}
private static int findClosestPowerOfTwoLessOrEqualThan(final int i) {
if (i > 1) {
return 2 << (30 - Integer.numberOfLeadingZeros(i));
}
return 1;
}

NOTICE: Reason for current comments under my answer is that first I wrote this algorithm completely wrong and user2357112 awared me from my mistakes. So I completely removed that algorithm and wrote a new one according to what user2357112 answered to this question. I also added some comments into this algorithm to clarify what happens in each line.
This algorithm starts at procedure main(Origin, Dest) and it simulate our movements toward destination with updateOrigin(Origin, Dest)
procedure main(Origin, Dest){
//at the end we have number of minimum steps in this variable
counter = 0;
while(Origin != Dest){
//we simulate our movement toward destination with this
Origin = updateOrigin(Origin, Dest);
counter = counter + 1;
}
}
procedure updateOrigin(Origin, Dest){
if (Origin == 1) return 2;
//we must find which train pass from our origin, what comes out from this IF clause is NOT exact choice and we still have to do some calculation in future
if (Origin == 0){
//all trains pass from stop 0, thus we can choose our train according to destination
n = Log2(Dest);
}else{
//its a good starting point to check if it pass from our origin
n = Log2(Origin);
}
//now lets choose exact train which pass from origin and doesn't overshoot destination
counter = 0;
do {
temp = counter * 2 ^ (n - 1);
//we have found suitable train
if (temp == Origin){
//where we have moved to
return Origin + 2 ^ ( n - 1 );
//we still don't know if this train pass from our origin
} elseif (temp < Origin){
counter = counter + 1;
//lets check another train
} else {
n = n - 1;
counter = 0;
}
}while(temp < origin)
}

Calculate the winning strategy of a subtraction game

Problem:
Given 100 stones, two players alternate to take stones out. One can take any number from 1 to 15; however, one cannot take any number that was already taken. If in the end of the game, there is k stones left, but 1 through k have all been previously taken, one can take k stones. The one who takes the last stone wins. How can the first player always win?
My Idea
Use recursion (or dynamic programming). Base case 1, where player 1 has a winning strategy.
Reducing: for n stones left, if palyer 1 takes m1 stones, he has to ensure that for all options player 2 has (m2), he has a winning strategy. Thus the problem is reduced to (n - m1 - m2).
Follow Up Question:
If one uses DP, the potential number of tables to be filled is large (2^15), since the available options left depend on the history, which has 2^15 possibilities.
How can you optimize?

Assuming that the set of numbers remaining can be represented as R, the highest number remaining after your selection can be represented by RH and the lowest number remaining can be RL, the trick is to use your second-to-last move to raise the number to <100-RH, but >100-RH-RL. That forces your opponent to take a number that will put you in winning range.
The final range of winning, with the total number that you create with your second-to-last move, is:
N < 100-RH
N > 100-RH-RL
By observation I noted that RH can be as high as 15 and as low as 8. RL can be as low as 1 and as high as 13. From this range I evaluated the equations.
N < 100-[8:15] => N < [92:85]
N > 100-[8:15]-[1:13] => N > [92:85] - [1:13] => N > [91:72]
Other considerations can narrow this gap. RL, for instance, is only 13 in an edge circumstance that always results in a loss for Player A, so the true range is between 72 and 91. There is a similar issue with RH and the low end of it, so the final ranges and calculations are:
N < 100-[9:15] => N < [91:85]
N > 100-[9:15]-[1:12] => N > [91:85] - [1:12] => N > [90:73]
[90:73] < N < [91:85]
Before this, however, the possibilities explode. Remember, this is AFTER you choose your second-to-last number, not before. At this point they are forced to choose a number that will allow you to win.
Note that 90 is not a valid choice to win with, even though it might exist. Thus, the maximum it can be is 89. The real range of N is:
[88:73] < N < [90:85]
It is, however, possible to calculate the range of the number that you're using to put your opponent in a no-win situation. In the situation you find yourself in, the lowest number or the highest number might be the one you chose, so if RHc is the highest number you can pick and RLc is the lowest number you can pick, then
RHc = [9:15]
RLc = [1:12]
With this information, I can begin constructing a relative algorithm starting from the end of the game.
N*p* - RHp - RLp < Np < N*p* - RHp, where p = iteration and *p* = iteration + 1
RHp = [8+p:15]
RLp = [1:13-p]
p = -1 is your winning move
p = 0 is your opponent's helpless move
p = 1 is your set-up move
Np is the sum of that round.
Thus, solving the algorithm for your set-up move, p=1, you get:
N*p* - [9:15] - [1:12] < Np < N*p* - [9:15]
100 <= N*p* <= 114
I'm still working out the math for this, so expect adjustments. If you see an error, please let me know and I'll adjust appropriately.

Here is a simple, brute force Python code:
# stoneCount: number of stones to start the game with
# possibleMoves: which numbers of stones may be removed? (*sorted* list of integers)
# return value: signals if winning can be forced by first player;
# if True, the winning move is attached
def isWinningPosition(stoneCount, possibleMoves):
if stoneCount == 0:
return False
if len(possibleMoves) == 0:
raise ValueError("no moves left")
if stoneCount in possibleMoves or stoneCount < possibleMoves[0]:
return True,stoneCount
for move in possibleMoves:
if move > stoneCount:
break
remainingMoves = [m for m in possibleMoves if m != move]
winning = isWinningPosition(stoneCount - move, remainingMoves)
if winning == False:
return True,move
return False
For the given problem size this function returns in less than 20 seconds on an Intel i7:
>>> isWinningPosition(100, range(1,16))
False
(So the first play cannot force a win in this situation. Whatever move he makes, it will result in a winning position for the second player.)
Of course, there is a lot of room for run time optimization. In the above implementation many situations are reached and recomputed again and again (e.g. when the first play takes one stone and the second player takes two stones this will put the first player into the same situation as when the number of stones taken by each player are reversed). So the first (major) improvement is to memorize already computed situations. Then one could go for more efficient data structures (e.g. encoding the list of possible moves as bit pattern).

How to account for position's history in transposition tables

I'm currently developing a solver for a trick-based card game called Skat in a perfect information situation. Although most of the people may not know the game, please bear with me; my problem is of a general nature.
Short introduction to Skat:
Basically, each player plays one card alternatingly, and every three cards form a trick. Every card has a specific value. The score that a player has achieved is the result of adding up the value of every card contained in the tricks that the respective player has won. I left out certain things that are unimportant for my problem, e.g. who plays against whom or when do I win a trick.
What we should keep in mind is that there is a running score, and who played what before when investigating a certain position (-> its history) is relevant to that score.
I have written an alpha beta algorithm in Java which seems to work fine, but it's way too slow. The first enhancement that seems the most promising is the use of a transposition table. I read that when searching the tree of a Skat game, you will encounter a lot of positions that have already been investigated.
And that's where my problem comes into play: If I find a position that has already been investigated before, the moves leading to this position have been different. Therewith, in general, the score (and alpha or beta) will be different, too.
This leads to my question: How can I determine the value of a position, if I know the value of the same position, but with a different history?
In other words: How can I decouple a subtree from its path to the root, so that it can be applied to a new path?
My first impulse was it's just not possible, because alpha or beta could have been influenced by other paths, which might not be applicable to the current position, but...
There already seems to be a solution
...that I don't seem to understand. In Sebastion Kupferschmid's master thesis about a Skat solver, I found this piece of code (maybe C-ish / pseudo code?):
def ab_tt(p, alpha, beta):
if p isa Leaf:
return 0
if hash.lookup(p, val, flag):
if flag == VALID:
return val
elif flag == LBOUND:
alpha = max(alpha, val)
elif flag == UBOUND:
beta = min(beta, val)
if alpha >= beta:
return val
if p isa MAX_Node:
res = alpha
else:
res = beta
for q in succ(p):
if p isa MAX_Node:
succVal = t(q) + ab_tt(q, res - t(q), beta - t(q))
res = max(res, succVal)
if res >= beta:
hash.add(p, res, LBOUND)
return res
elif p isa MIN_Node:
succVal = t(q) + ab_tt(q, alpha - t(q), res - t(q))
res = min(res, succVal)
if res <= alpha:
hash.add(p, res, UBOUND)
return res
hash.add(p, res, VALID)
return res
It should be pretty self-explanatory. succ(p) is a function that returns every possible move at the current position. t(q) is what I believe to be the running score of the respective position (the points achieved so far by the declarer).
Since I don't like copying stuff without understanding it, this should just be an aid for anyone who would like to help me out. Of course, I have given this code some thought, but I can't wrap my head around one thing: By subtracting the current score from alpha/beta before calling the function again [e.g. ab_tt(q, res - t(q), beta - t(q))], there seems to be some kind of decoupling going on. But what exactly is the benefit if we store the position's value in the transposition table without doing the same subtraction right here, too? If we found a previously investigated position, how come we can just return its value (in case it's VALID) or use the bound value for alpha or beta? The way I see it, both storing and retrieving values from the transposition table won't account for the specific histories of these positions. Or will it?
Literature:
There's almost no English sources out there that deal with AI in skat games, but I found this one: A Skat Player Based on Monte Carlo Simulation by Kupferschmid, Helmert. Unfortunately, the whole paper and especially the elaboration on transposition tables is rather compact.
Edit:
So that everyone can imagine better how the score develops thoughout a Skat game until all cards have been played, here's an example. The course of the game is displayed in the lower table, one trick per line. The actual score after each trick is on its left side, where +X is the declarer's score (-Y is the defending team's score, which is irrelevant for alpha beta). As I said, the winner of a trick (declarer or defending team) adds the value of each card in this trick to their score.
The card values are:
Rank J A 10 K Q 9 8 7
Value 2 11 10 4 3 0 0 0

I solved the problem. Intead of doing weird subtractions upon each recursive call, as suggested by the reference in my question, I subtract the running score from the resulting alpha beta value, only when storing a position in the transposition table:
For exact values (the position hasn't been pruned):
transpo.put(hash, new int[] { TT_VALID, bestVal - node.getScore()});
If the node caused a beta-cutoff:
transpo.put(hash, new int[] { TT_LBOUND, bestVal - node.getScore()});
If the node caused an alpha-cutoff:
transpo.put(hash, new int[] { TT_UBOUND, bestVal - node.getScore()});
Where:
transpo is a HashMap<Long, int[]>
hash is the long value representing that position
bestVal is either the exact value or the value that caused a cutoff
TT_VALID, TT_LBOUND and TT_UBOUND are simple constants, describing the type of transposition table entry
However, this didn't work per se. After posting the same question on gamedev.net, a user named Álvaro gave me the deciding hint:
When storing exact scores (TT_VALID), I should only store positions, that improved alpha.

Toilet Seat Algorithm

Let's take some regular house with a man, which has to go to the toilet every n minutes, requiring the seat to be up, and a woman, which has to do it every m minutes, requiring a seat to be down. Is there a possibility to create a O(1) algorithm which will output the exact number of toilet seat movements for a given period of X minutes? There are two different additional inputs:
1. The man always leaves the seat up after a visit.
2. The man always puts the seat down after a visit.
Conclusion: in the real life (which involves n being much more than m, with X->infinity), it is proven that there is no difference in a number of seat movements.
But if a man does it more often, then a woman, it will prolong the seat life if he will just leave the seat up, but is this case one of them (or both) should probably see a doctor.
Now I know what is the best for the seat itself, but which person makes more movements - is another question (which should not be asked anyways).

Yes, there is a basic O(1) algorithm.
I start with the assumption both people start "ticking" at t=0.
I believe the solution should generalize to different starting times, but it isn't hard to extend from one "free end" of the timeline to two ends.
Assume n <= m.
Then our timeline looks like this (an 'x' marks a 'move', not a visit)
0 m 2m .. t-t%m t
+-----+-----+-----+-----+-----+-----+--o
W x x x x x x x
M x x x x x x x x?
So, the woman goes floor(t/m) times, and between
each time the woman goes -- in the half-open interval (a*m,*m+m] --
the man goes at least once, thus flipping the seat once. for
each time that she flips the seat in an interval, he also flips it once.
However, he possibly will go once more after
her last trip, depending on their relative timings,
which you can calculate based on t modulo their respective periods.
total_moves = floor(t/m) * 2 + (t%m < t%n ? 1 : 0)
Now for the case n > m.
The roles of the woman and man are reversed... the half-open interval
[an, an+n) will always involve two moves. The remainder
of the line is [t-t%n, t), in which the man goes once at the beginning,
(which is +1 move, but we counted +2 for both people's moves at t=0, which we should probably discard) and the woman goes if she has equal or less time left than he does
total_moves = floor(t/n) * 2 - 1 + (t%m >= t%n ? 1 : 0)

For 2, the answer is 2*floor(X/n). The man will always go to the bathroom with the toilet seat down and leave it down. The woman will never put it down, since it's only up when the man goes to the bathroom.
1 is a little more tricky.
EDIT: Duh. For 1, the answer is 2*floor(X/m). The toilet seat only transitions when the woman goes to the bathroom.
EDIT2: Plus or minus the initial state of the toilet.
EDIT3: My answer to 1 is only correct if m>=n. I'll figure out the rest later.
EDIT4: If n>=2m, then it's 2*floor(X/n), since the seat will only transition when the man goes pee. If n>m, I believe the answer is also 2*floor(X/n), but I need to work out the math.
EDIT5: So, for 2m>n>m, the seat transitions when the man goes pee after the woman and vice versa. The sequence of man/woman visits repeats every least_common_multiple(m, n) minutes, so we only need to concern ourselves with what happens in that time period. The only time the seat would not transition when the man uses it would be if he managed to visit it twice in a row. Given that the woman is visiting more often than the man, between every man visit there is at least one woman visit. (Twice at the beginning or end.)
Answer 1 then becomes: (n>m ? 2*floor(X/n) : 2*floor(X/m)) + (remainder(X/n) > remainder(X/m) ? 1 : 0). Or something like that.

Yes there is, at least when the implementation can assume that the cycle for a man and a woman is known in advance and that it doesn't change:
Start with the least common multiple of the man/woman cycle times (lcm). Precalculate the movements for this time period (lcm_movements). Now you only have to deal with your input time modulo lcm. For this you could simply set up a fixed length table containing the number of movements for every minute.
Given that time and lcm are integers in Java/C/C++/C# the actual calculation might be this:
return ( time / lcm ) * lcm_movements + movements[ time % lcm ];

Assumptions:
we start at t=0 with the toilet seat down
if man and woman arrive at the same time, then ladies first.
Let lastLadyTime := floor(X/m)*m and lastManTime := floor(X/n)*n. They represent the last time of toilet usage. The expression (lastLadyTime > lastManTime) is the same as (X%m < X%n) because by definition X%m = X - lastLadyTime and X%n = X - lastManTime.
Case: man leaves seat down
The lady never has to move the seat but he always needs to lift it up. Hence floor(X/n).
Case: man leaves seat up, n == m
He will always need to lift it up and she will always need to push it down except at the very first toilet usage when she doesn't have to do anything. Hence 2*floor(X/n) - (X < n ? 0 : 1)
Case: man leaves seat up, n > m
Every time he uses it, he needs to lift it up. She only needs to push it down once after he uses it. This happens all the time except at the end if time runs out before she gets to use the toilet after him. Therefore we must minus 1 if lastManTime >= lastLadyTime (remember, ladies first). Hence 2*floor(X/n) - (lastManTime >= lastLadyTime ? 1 : 0) = 2*floor(X/n) - (X%n <= X%m ? 1 : 0)
Case: man leaves seat up, n < m
Similar to n > m. Every time she uses it, she needs to push it down. He only needs to lift it up once after she uses it. This happens all the time except at the end if time runs out before he has to use the toilet after her. Therefore we must minus 1 if lastManTime < lastLadyTime. Also one difference is that he needs to lift the seat the first time around. Hence 2*floor(X/m) - (lastManTime < lastLadyTime ? 1 : 0) + (X < n ? 0 : 1) = 2*floor(X/m) - (X%n > X%m ? 1 : 0) + (X < n ? 0 : 1)

If all minute variables are integers then you could do it like this:
int toilet_seat_movements = 0;
bool seat_up = false;
for (i = 0; i <= total_minutes; i++)
{
if (seat_up)
{
if (i % woman_minutes == 0)
toilet_seat_movements++;
}
else
{
if (i % man_minutes == 0)
toilet_seat_movements++;
}
}
return toilet_seat_movements;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

N-sided die MDP problem Value Iteration Solution Needed - algorithm

Related

Running a function multiple times and tracking results of the fight simulation

Minimum number of train station stops

Calculate the winning strategy of a subtraction game

How to account for position's history in transposition tables

Toilet Seat Algorithm

Categories

Resources