Ask for an algorithm - algorithm

I want to know what algorithm the following problem can refer to. I guess it can be represented in CSP but the actions are randomized.
Assume I am playing monopoly. I can choose either 1, 2, or 3 dice for movement in each round. My goal is to skip other players' buildings and also go to a specific range of grids. What is a good algorithm to
select number of dice in each round
subject to
1. minimize number of rounds
2. skip some grids
3. move to some grids

3 dice is a pretty small number, so brute force should work well. Assign a value to each square (maybe -$50 to a property with rent $50 if an opponent owns it), then compute the expected value of each roll (1/6 of each of the next six squares for 1 die, 1/36 of two squares ahead + 1/18 of three squares ahead + ... + 1/36 of 12 squares ahead, etc.).

Sounds like you need some sort of combination of A* or Dijkstra and MiniMax (atleast if you want have a better result than just what can be derived from the current state).

You need to assign weighting (risk or priorities) to each of your constraints, and this depends on how risk averse you are. You would assign a positive value for a benefit and a negative value for a penalty. So, for example, element 1. would always suggest using 3 dice, but you need to assign a relative benefit for each of the 3 options. Then for the other 2 you would, for each choice of dice calculate the probability of landing on a beneficial grid and similarly a penalty grid, multiple these by your weighting, and add them all together, choosing the most positive outcome. But note that different weightings will give different answers. If you give option 1 an absurdly high weighting it will always chose 3 dice. Otherwise any outcome is possible.

Related

Programming a probability to allow an AI decide when to discard a card or not in 5 card poker

I am writing an AI to play 5 card poker, where you are allowed to discard a card from your hand and swap it for another randomly dealt one if you wish. My AI can value every possible poker hand as shown in the answer to my previous question. In short, it assigns a unique value to each possible hand where a higher value correlates to a better/winning hand.
My task is to now write a function, int getDiscardProbability(int cardNumber) that gives my AI a number from 0-100 relating to whether or not it should discard this card (0 = defintely do not discard, 100 = definitely discard).
The approach I have thought up of was to compute every possible hand by swapping this card for every other card in the deck (assume there are still 47 left, for now), then compare each of their values with the current hand, count how many are better and so (count / 47) * 100 is my probability.
However, this solution is simply looking for any better hand, and not distinguishing between how much better one hand is. For example, if my AI had the hand 23457, it could discard the 7 for an K, producing a very slightly better hand (better high card), or it could exchange the 7 for an A or a 6, completing the Straight - a much better hand (much higher value) than a High King.
So, when my AI is calculating this probability, it would be increased by the same amount when it sees that the hand could be improved by getting the K than it would when it sees that the hand could be improved by getting an A or 6. Because of this, I somehow need to factor in the difference in value from my hand and each of the possible hands when calculating this probability. What would be a good approach to achieve this with?
Games in general have a chicken-egg problem: you want to design an AI that can beat a good player, but you need a good AI to train your AI against. I'll assume you're making an AI for a 2-player version of poker that has antes but no betting.
First, I'd note that if I had a table of probabilities for win-rate for each possible poker hand (of which there are surprisingly few really different ones), one can write a function that tells you the expected value from discarding a set of cards from your hand: simply enumerate all possible replacement cards and average the probability of winning with the hands. There's not that many cards to evaluate -- even if you don't ignore suits, and you're replacing the maximum 3 cards, you have only 47 * 46 * 43 / 6 = 16215 possibilities. In practice, there's many fewer interesting possibilities -- for example, if the cards you don't discard aren't all of the same suit, you can ignore suits completely, and if they are of the same suit, you only need to distinguish "same suit" replacements with "different suits" replacement. This is slightly trickier than I describe it, since you've got to be careful to count possibilities right.
Then your AI can work by enumerating all the possible sets of cards to discard of which there are (5 choose 0) + (5 choose 1) + (5 choose 2) + (5 choose 3) = 1 + 5 + 10 + 10 = 26, and pick the one with the highest expectation, as computed above.
The chicken-egg problem is that you don't have a table of win-rate probabilities per hand. I describe an approach for a different poker-related game here, but the idea is the same: http://paulhankin.github.io/ChinesePoker/ . This approach is not my idea, and essentially the same idea is used for example in game-theory-optimal solvers for real poker variants like piosolver.
Here's the method.
Start with a table of probabilities made up somehow. Perhaps you just start assuming the highest rank hand (AKQJTs) wins 100% of the time and the worst hand (75432) wins 0% of the time, and that probabilities are linear in between. It won't matter much.
Now, simulate tens of thousands of hands with your AI and count how often each hand rank is played. You can use this to construct a new table of win-rate probabilities. This new table of win-rate probabilities is (ignoring some minor theoretical issues) an optimal counter-strategy to your AI in that an AI that uses this table knows how likely your original AI is to end up with each hand, and plays optimally against that.
The natural idea is now to repeat the process again, and hope this yields better and better AIs. However, the process will probably oscillate and not settle down. For example, if at one stage of your training your AI tends to draw to big hands, the counter AI will tend to play very conservatively, beating your AI when it misses its draw. And against a very conservative AI, a slightly less conservative AI will do better. So you'll tend to get a sequence of less and less conservative AIs, and then a tipping point where your AI is beaten again by an ultra-conservative one.
But the fix for this is relatively simple -- just blend the old table and the new table in some way (one standard way is to, at step i, replace the table with a weighted average of 1/i of the new table and (i-1)/i of the old table). This has the effect of not over-adjusting to the most recent iteration. And ignoring some minor details that occur because of assumptions (for example, ignoring replacement effects from the original cards in your hand), this approach will give you a game-theoretically optimal AI, as described in: "An iterative method of solving a game, Julia Robinson (1950)."
A simple (but not so simple) way would be to use some kind of database with the hand combination probabilities (maybe University of Alberta Computer Poker Research Group Database).
The idea is getting to know each combination how much percentage of winning has. And doing the combination and comparing that percentage of each possible hand.
For instance, you have 5 cards, AAAKJ, and it's time to discard (or not).
AAAKJ has a winning percentage (which I ignore, lets say 75)
AAAK (discarting J) has a 78 percentage (let's say).
AAAJ (discarting K) has x.
AAA (discarting KJ) has y.
AA (discarting AKJ) has z.
KJ (discarting AAA) has 11 (?)..
etc..
And the AI would keep the one from the combination which had a higher rate of success.
Instead of counting how many are better you might compute a sum of probabilities Pi that the new hand (with swapped card) will win, i = 1, ..., 47.
This might be a tough call because of other players as you don't know their cards, and thus, their current chances to win. To make it easier, maybe an approximation of some sort can be applied.
For example, Pi = N_lose / N where N_lose is the amount of hands that would lose to the new hand with ith card, and N is the total possible amount of hands without the 5 that the AI is holding. Finally, you use the sum of Pi instead of count.

A good randomizer for puzzle-15

I have implemented a puzzle 15 for people to compete online. My current randomizer works by starting from the good configuration and moving tiles around for 100 moves (arbitrary number)
Everything is fine, however, once in a little while the tiles are shuffled too easy and it takes only a few moves to solve the puzzle, therefore the game is really unfair for some people reaching better scores in a much higher speed.
What would be a good way to randomize the initial configuration so it is not "too easy"?
You can generate a completely random configuration (that is solvable) and then use some solver to determine the optimal sequence of moves. If the sequence is long enough for you, good, otherwise generate a new configuration and repeat.
Update & details
There is an article on Wikipedia about the 15-puzzle and when it is (and isn't) solvable. In short, if the empty square is in the lower-right corner, then the puzzle is solvable if and only if the number of inversions (an inversion is a swap of two elements in the sequence, not necessarily adjacent elements) with respect to the goal permutation is even.
You can then easily generate a solvable start state by doing an even number of inversions, which may lead to a not-so-easy-to-solve state far quicker than by doing regular moves, and it is guaranteed that it will remain solvable.
In fact, you don't need to use a search algorithm as I mentioned above, but an admissible heuristic. Such one always underestimates never overestimates the number of moves needed to solve the puzzle, i.e. you are guaranteed that it will not take less moves that the heuristic tells you.
A good heuristic is the sum of manhattan distances of each number to its goal position.
Summary
In short, a possible (very simple) algorithm for generating starting positions might look like this:
1: current_state <- goal_state
2: swap two arbitrary (randomly selected) pieces
3: swap two arbitrary (randomly selected) pieces again (to ensure solvability)
4: h <- heuristic(current_state)
5: if h > desired threshold
6: return current_state
7: else
8: go to 2.
To be absolutely certain about how difficult a state is, you need to find the optimal solution using some solver. Heuristics will give you only an estimate.
I would do this
start from solution (just like you did)
make valid turn in random direction
so you must keep track where the gap is and generate random direction (N,E,S,W) and do the move. I think this part you have done too.
compute the randomness of your placements
So compute some coefficient dependent on the order of the array. So ordered (solved) solutions will have low values and random will have high values. The equation for the coefficiet however is a matter of trial and error. Here some ideas what to use:
correlation coefficient
sum of average difference of value and its neighbors
1 2 4
3 6 5
9 8 7
coeff(6)= (|6-3|+|6-5|+|6-2|+|6-8|)/4
coeff=coeff(1)+coeff(2)+...coeff(15)
abs distance from ordered array
You can combine more approaches together. You can divide this to separated rows and columns and then combine the sub coefficients together.
loop #2 unit coefficient from #3 is high enough (treshold)
The treshold can be used also to change the difficulty.

Most optimal match-up

Let's assume you're a baseball manager. And you have N pitchers in your bullpen (N<=14) and they have to face M batters (M<=100). Also to mention you know the strength of each of the pitchers and each of the batters. For those who are not familiar to baseball once you brought in a relief pitcher he can pitch to k consecutive batters, but once he's taken out ofthe game he cannot come back.
For each pitcher the probability that he's gonna lose his match-ups is given by (sum of all batter he will face)/(his strength). Try to minimize these probabilities, i.e. try to maximize your chances of winning the game.
For example we have 3 pitchers and they have to face 3 batters. The batters' stregnths are:
10 40 30
While the strength of your pitchers is:
40 30 3
The most optimal solution would be to bring the strongest pitcher to face the first 2 batters and the second to face the third batter. Then the probability of every pitcher losing his game will be:
50/40 = 1.25 and 30/30 = 1
So the probability of losing the game would be 1.25 (This number can be bigger than 100).
How can you find the optimal number? I was thinking to take a greedy approach, but I suspect whether it will always hold. Also the fact that the pitcher can face unlimited number of batters (I mean it's only limited by M) poses the major problem for me.
Probabilities must be in the range [0.0, 1.0] so what you call a probability can't be a probability. I'm just going to call it a score and minimize it.
I'm going to assume for now that you somehow know the order in which the pitchers should play.
Given the order, what is left to decide is how long each pitcher plays. I think you can find this out using dynamic programming. Consider the batters to be faced in order. Build an NxM table best[pitchers, batter] where best[i, j] is the best score you can make considering just the first j batters using the first i pitchers, or HUGE if it does not make sense.
best[1,1] is just the score for the best pitcher against the first batter, and best[1,j] doesn't make sense for any other values of j.
For larger values of i you work out best[i,j] by considering when the last change of pitcher could be, considering all possibilities (so 1, 2, 3...i). If the last change of pitcher was at time t, then look up best[t, j-1] to get the score up to the time just before that change, and then calculate the a/b value to take account of the sum of batter strengths between time t+1 and time i. When you have considered all possible times, take the best score and use it as the value for best[i, j]. Note down enough info (such as the last time of pitcher change that turned out to be best) so that once you have calculated best[N, M], you can trace back to find the best schedule.
You don't actually know the order, and because the final score is the maximum of the a/b value for each pitcher, the order does matter. However, given a separation of players into groups, the best way to assign pitchers to groups is to assign the best pitcher to the group with the highest total score, the next best pitcher to the group with the next best total score, and so on. So you could alternate between dividing batters into groups, as described above, and then assigning pitchers to groups to work out the order the pitchers really should be in - keep doing this until the answer stops changing and hope the result is a global optimum. Unfortunately there is no guarantee of this.
I'm not convinced that your score is a good model for baseball, especially since it started out as a probability but can't be. Perhaps you should work out a few examples (maybe even solving small examples by brute force) and see if the results look reasonable.
Another way to approach this problem is via http://en.wikipedia.org/wiki/Branch_and_bound.
With branch and bound you need some way to describe partial answers, and you need some way to work out a value V for a given partial answer, such that no way of extending that partial answer can possibly produce a better answer than V. Then you run a tree search, extending partial answers in every possible way, but discarding partial answers which can't possibly be any better than the best answer found so far. It is good if you can start off with at least a guess at the best answer, because then you can discard poor partial answers from the start. My other answer might provide a way of getting this.
Here a partial answer is a selection of pitchers, in the order they should play, together with the number of batters they should pitch to. The first partial answer would have 0 pitchers, and you could extend this by choosing each possible pitcher, pitching to each possible number of batters, giving a list of partial answers each mentioning just one pitcher, most of which you could hopefully discard.
Given a partial answer, you can compute the (total batter strength)/(Pitcher strength) for each pitcher in its selection. The maximum found here is one possible way of working out V. There is another calculation you can do. Sum up the total strengths of all the batters left and divide by the total strengths of all the pitchers left. This would be the best possible result you could get for the pitchers left, because it is the result you get if you somehow manage to allocate pitchers to batters as evenly as possible. If this value is greater than the V you have calculated so far, use this instead of V to get a less optimistic (but more accurate) measure of how good any descendant of that partial answer could possibly be.

Genetic Algorithm Implementation for weight optimization

I am a data mining student and I have a problem that I was hoping that you guys could give me some advice on:
I need a genetic algo that optimizes the weights between three inputs. The weights need to be positive values AND they need to sum to 100%.
The difficulty is in creating an encoding that satisfies the sum to 100% requirement.
As a first pass, I thought that I could simply create a chrom with a series of numbers (ex.4,7,9). Each weight would simply be its number divided by the sum of all of the chromosome's numbers (ex. 4/20=20%).
The problem with this encoding method is that any change to the chromosome will change the sum of all the chromosome's numbers resulting in a change to all of the chromosome's weights. This would seem to significantly limit the GA's ability to evolve a solution.
Could you give any advice on how to approach this problem?
I have read about real valued encoding and I do have an implementation of a GA but it will give me weights that may not necessarily add up to 100%.
It is mathematically impossible to change one value without changing at least one more if you need the sum to remain constant.
One way to make changes would be exactly what you suggest: weight = value/sum. In this case when you change one value, the difference to be made up is distributed across all the other values.
The other extreme is to only change pairs. Start with a set of values that add to 100, and whenever 1 value changes, change another by the opposite amount to maintain your sum. The other could be picked randomly, or by a rule. I'd expect this would take longer to converge than the first method.
If your chromosome is only 3 values long, then mathematically, these are your only two options.

What algorithm should I use for "genetic AI improvement"

First of all: This is not a question about how to make a program play Five in a Row. Been there, done that.
Introductory explanation
I have made a five-in-a-row-game as a framework to experiment with genetically improving AI (ouch, that sounds awfully pretentious). As with most turn-based games the best move is decided by assigning a score to every possible move, and then playing the move with the highest score. The function for assigning a score to a move (a square) goes something like this:
If the square already has a token, the score is 0 since it would be illegal to place a new token in the square.
Each square can be a part of up to 20 different winning rows (5 horizontal, 5 vertical, 10 diagonal). The score of the square is the sum of the score of each of these rows.
The score of a row depends on the number of friendly and enemy tokens already in the row. Examples:
A row with four friendly tokens should have infinite score, because if you place a token there you win the game.
The score for a row with four enemy tokens should be very high, since if you don't put a token there, the opponent will win on his next turn.
A row with both friendly and enemy tokens will score 0, since this row can never be part of a winning row.
Given this algorithm, I have declared a type called TBrain:
type
TBrain = array[cFriendly..cEnemy , 0..4] of integer;
The values in the array indicates the score of a row with either N friendly tokens and 0 enemy tokens, or 0 friendly tokens and N enemy tokens. If there are 5 tokens in a row there's no score since the row is full.
It's actually quite easy to decide which values should be in the array. Brain[0,4] (four friendly tokens) should be "infinite", let's call that 1.000.000. vBrain[1,4] should be very high, but not so high that the brain would prefer blocking several enemy wins rather than wining itself
Concider the following (improbable) board:
0123456789
+----------
0|1...1...12
1|.1..1..1.2
2|..1.1.1..2
3|...111...2
4|1111.1111.
5|...111....
6|..1.1.1...
7|.1..1..1..
8|1...1...1.
Player 2 should place his token in (9,4), winning the game, not in (4,4) even though he would then block 8 potential winning rows for player 1. Ergo, vBrain[1,4] should be (vBrain[0,4]/8)-1. Working like this we can find optimal values for the "brain", but again, this is not what I'm interested in. I want an algorithm to find the best values.
I have implemented this framework so that it's totally deterministic. There's no random values added to the scores, and if several squares have the same score the top-left will be chosen.
Actual problem
That's it for the introduction, now to the interesting part (for me, at least)
I have two "brains", vBrain1 and vBrain2. How should I iteratively make these better? I Imagine something like this:
Initialize vBrain1 and vBrain2 with random values.
Simulate a game between them.
Assign the values from the winner to the loser, then randomly change one of them slightly.
This doesn't seem work. The brains don't get any smarter. Why?
Should the score-method add some small random values to the result, so that two games between the same two brains would be different? How much should the values change for each iteration? How should the "brains" be initialized? With constant values? With random values?
Also, does this have anything to do with AI or genetic algorithms at all?
PS: The question has nothing to do with Five in a Row. That's just something I chose because I can declare a very simple "Brain" to experiment on.
If you want to approach this problem like a genetic algorithm, you will need an entire population of "brains". Then evaluate them against each other, either every combination or use a tournament style. Then select the top X% of the population and use those as the parents of the next generation, where offspring are created via mutation (which you have) or genetic crossover (e.g., swap rows or columns between two "brains").
Also, if you do not see any evolutionary progress, you may need more than just win/loss, but come up with some kind of point system so that you can rank the entire population more effectively, which makes selection easier.
Generally speaking, yes you can make a brain smarter by using genetic algorithms techniques.
Randomness, or mutation, plays a significant part on genetic programming.
I like this tutorial, Genetic Algorithms: Cool Name & Damn Simple.
(It uses Python for the examples but it's not difficult to understand them)
Take a look at Neuro Evolution of Augmenting Tologies (NEAT). A fancy acronymn which basically means the evolution of neural nets - both their structure (topology) and connection weights. I wrote a .Net implementation called SharpNEAT that you may wish to look at. SharpNEAT V1 also has a Tic-Tac-Toe experiment.
http://sharpneat.sourceforge.net/

Resources