What algorithm should I use for "genetic AI improvement" - algorithm

First of all: This is not a question about how to make a program play Five in a Row. Been there, done that.
Introductory explanation
I have made a five-in-a-row-game as a framework to experiment with genetically improving AI (ouch, that sounds awfully pretentious). As with most turn-based games the best move is decided by assigning a score to every possible move, and then playing the move with the highest score. The function for assigning a score to a move (a square) goes something like this:
If the square already has a token, the score is 0 since it would be illegal to place a new token in the square.
Each square can be a part of up to 20 different winning rows (5 horizontal, 5 vertical, 10 diagonal). The score of the square is the sum of the score of each of these rows.
The score of a row depends on the number of friendly and enemy tokens already in the row. Examples:
A row with four friendly tokens should have infinite score, because if you place a token there you win the game.
The score for a row with four enemy tokens should be very high, since if you don't put a token there, the opponent will win on his next turn.
A row with both friendly and enemy tokens will score 0, since this row can never be part of a winning row.
Given this algorithm, I have declared a type called TBrain:
type
TBrain = array[cFriendly..cEnemy , 0..4] of integer;
The values in the array indicates the score of a row with either N friendly tokens and 0 enemy tokens, or 0 friendly tokens and N enemy tokens. If there are 5 tokens in a row there's no score since the row is full.
It's actually quite easy to decide which values should be in the array. Brain[0,4] (four friendly tokens) should be "infinite", let's call that 1.000.000. vBrain[1,4] should be very high, but not so high that the brain would prefer blocking several enemy wins rather than wining itself
Concider the following (improbable) board:
0123456789
+----------
0|1...1...12
1|.1..1..1.2
2|..1.1.1..2
3|...111...2
4|1111.1111.
5|...111....
6|..1.1.1...
7|.1..1..1..
8|1...1...1.
Player 2 should place his token in (9,4), winning the game, not in (4,4) even though he would then block 8 potential winning rows for player 1. Ergo, vBrain[1,4] should be (vBrain[0,4]/8)-1. Working like this we can find optimal values for the "brain", but again, this is not what I'm interested in. I want an algorithm to find the best values.
I have implemented this framework so that it's totally deterministic. There's no random values added to the scores, and if several squares have the same score the top-left will be chosen.
Actual problem
That's it for the introduction, now to the interesting part (for me, at least)
I have two "brains", vBrain1 and vBrain2. How should I iteratively make these better? I Imagine something like this:
Initialize vBrain1 and vBrain2 with random values.
Simulate a game between them.
Assign the values from the winner to the loser, then randomly change one of them slightly.
This doesn't seem work. The brains don't get any smarter. Why?
Should the score-method add some small random values to the result, so that two games between the same two brains would be different? How much should the values change for each iteration? How should the "brains" be initialized? With constant values? With random values?
Also, does this have anything to do with AI or genetic algorithms at all?
PS: The question has nothing to do with Five in a Row. That's just something I chose because I can declare a very simple "Brain" to experiment on.

If you want to approach this problem like a genetic algorithm, you will need an entire population of "brains". Then evaluate them against each other, either every combination or use a tournament style. Then select the top X% of the population and use those as the parents of the next generation, where offspring are created via mutation (which you have) or genetic crossover (e.g., swap rows or columns between two "brains").
Also, if you do not see any evolutionary progress, you may need more than just win/loss, but come up with some kind of point system so that you can rank the entire population more effectively, which makes selection easier.

Generally speaking, yes you can make a brain smarter by using genetic algorithms techniques.
Randomness, or mutation, plays a significant part on genetic programming.
I like this tutorial, Genetic Algorithms: Cool Name & Damn Simple.
(It uses Python for the examples but it's not difficult to understand them)

Take a look at Neuro Evolution of Augmenting Tologies (NEAT). A fancy acronymn which basically means the evolution of neural nets - both their structure (topology) and connection weights. I wrote a .Net implementation called SharpNEAT that you may wish to look at. SharpNEAT V1 also has a Tic-Tac-Toe experiment.
http://sharpneat.sourceforge.net/

Related

Programming a probability to allow an AI decide when to discard a card or not in 5 card poker

I am writing an AI to play 5 card poker, where you are allowed to discard a card from your hand and swap it for another randomly dealt one if you wish. My AI can value every possible poker hand as shown in the answer to my previous question. In short, it assigns a unique value to each possible hand where a higher value correlates to a better/winning hand.
My task is to now write a function, int getDiscardProbability(int cardNumber) that gives my AI a number from 0-100 relating to whether or not it should discard this card (0 = defintely do not discard, 100 = definitely discard).
The approach I have thought up of was to compute every possible hand by swapping this card for every other card in the deck (assume there are still 47 left, for now), then compare each of their values with the current hand, count how many are better and so (count / 47) * 100 is my probability.
However, this solution is simply looking for any better hand, and not distinguishing between how much better one hand is. For example, if my AI had the hand 23457, it could discard the 7 for an K, producing a very slightly better hand (better high card), or it could exchange the 7 for an A or a 6, completing the Straight - a much better hand (much higher value) than a High King.
So, when my AI is calculating this probability, it would be increased by the same amount when it sees that the hand could be improved by getting the K than it would when it sees that the hand could be improved by getting an A or 6. Because of this, I somehow need to factor in the difference in value from my hand and each of the possible hands when calculating this probability. What would be a good approach to achieve this with?
Games in general have a chicken-egg problem: you want to design an AI that can beat a good player, but you need a good AI to train your AI against. I'll assume you're making an AI for a 2-player version of poker that has antes but no betting.
First, I'd note that if I had a table of probabilities for win-rate for each possible poker hand (of which there are surprisingly few really different ones), one can write a function that tells you the expected value from discarding a set of cards from your hand: simply enumerate all possible replacement cards and average the probability of winning with the hands. There's not that many cards to evaluate -- even if you don't ignore suits, and you're replacing the maximum 3 cards, you have only 47 * 46 * 43 / 6 = 16215 possibilities. In practice, there's many fewer interesting possibilities -- for example, if the cards you don't discard aren't all of the same suit, you can ignore suits completely, and if they are of the same suit, you only need to distinguish "same suit" replacements with "different suits" replacement. This is slightly trickier than I describe it, since you've got to be careful to count possibilities right.
Then your AI can work by enumerating all the possible sets of cards to discard of which there are (5 choose 0) + (5 choose 1) + (5 choose 2) + (5 choose 3) = 1 + 5 + 10 + 10 = 26, and pick the one with the highest expectation, as computed above.
The chicken-egg problem is that you don't have a table of win-rate probabilities per hand. I describe an approach for a different poker-related game here, but the idea is the same: http://paulhankin.github.io/ChinesePoker/ . This approach is not my idea, and essentially the same idea is used for example in game-theory-optimal solvers for real poker variants like piosolver.
Here's the method.
Start with a table of probabilities made up somehow. Perhaps you just start assuming the highest rank hand (AKQJTs) wins 100% of the time and the worst hand (75432) wins 0% of the time, and that probabilities are linear in between. It won't matter much.
Now, simulate tens of thousands of hands with your AI and count how often each hand rank is played. You can use this to construct a new table of win-rate probabilities. This new table of win-rate probabilities is (ignoring some minor theoretical issues) an optimal counter-strategy to your AI in that an AI that uses this table knows how likely your original AI is to end up with each hand, and plays optimally against that.
The natural idea is now to repeat the process again, and hope this yields better and better AIs. However, the process will probably oscillate and not settle down. For example, if at one stage of your training your AI tends to draw to big hands, the counter AI will tend to play very conservatively, beating your AI when it misses its draw. And against a very conservative AI, a slightly less conservative AI will do better. So you'll tend to get a sequence of less and less conservative AIs, and then a tipping point where your AI is beaten again by an ultra-conservative one.
But the fix for this is relatively simple -- just blend the old table and the new table in some way (one standard way is to, at step i, replace the table with a weighted average of 1/i of the new table and (i-1)/i of the old table). This has the effect of not over-adjusting to the most recent iteration. And ignoring some minor details that occur because of assumptions (for example, ignoring replacement effects from the original cards in your hand), this approach will give you a game-theoretically optimal AI, as described in: "An iterative method of solving a game, Julia Robinson (1950)."
A simple (but not so simple) way would be to use some kind of database with the hand combination probabilities (maybe University of Alberta Computer Poker Research Group Database).
The idea is getting to know each combination how much percentage of winning has. And doing the combination and comparing that percentage of each possible hand.
For instance, you have 5 cards, AAAKJ, and it's time to discard (or not).
AAAKJ has a winning percentage (which I ignore, lets say 75)
AAAK (discarting J) has a 78 percentage (let's say).
AAAJ (discarting K) has x.
AAA (discarting KJ) has y.
AA (discarting AKJ) has z.
KJ (discarting AAA) has 11 (?)..
etc..
And the AI would keep the one from the combination which had a higher rate of success.
Instead of counting how many are better you might compute a sum of probabilities Pi that the new hand (with swapped card) will win, i = 1, ..., 47.
This might be a tough call because of other players as you don't know their cards, and thus, their current chances to win. To make it easier, maybe an approximation of some sort can be applied.
For example, Pi = N_lose / N where N_lose is the amount of hands that would lose to the new hand with ith card, and N is the total possible amount of hands without the 5 that the AI is holding. Finally, you use the sum of Pi instead of count.

AI algorithm for item pickup race

I would like to build an AI for the following game:
there are two players on a M x N board
each player can move up/down or left/right
there are different items on the board
the player wins who has more items than the other player in as many categories as possible (having more items in one category makes you the winner of this category, the player with more categories wins the game)
in one turn you can either pick up an item you are standing on or move
player moves are made at the same time
two players standing on the same field have a 0.5 pickup chance if both do it
The game ends if one of the following condition is met:
all the items have been picked up
there is already a clear winner since one player has has more than half the items of more than half of the categories
I have no idea of AI, but I have taken a machine learning class some time ago.
How do I get started on such a problem?
Is there a generalization of this problem?
The canonical choice for adversarial search games like you proposed (called two player zero-sum games) is called Minimax search. From wikipedia, the goal of Minimax is to
Minimize the possible loss for a worst case (maximum loss) scenario. Alternatively, it can be thought of as maximizing the minimum gain.
Hence, it is called minimax, or maximin. Essentially you build a tree of Max and Min levels, where the nodes each have a branching factor equal to the number of possible actions at each turn, 4 in your case. Each level corresponds to one of the player's turns, and the tree extends until the end of the game, allowing you to search for the optimal choice at each turn, assuming the opponent is playing optimally as well. If your opponent is not playing optimally, you will only score better. Essentially, at each node you simulate every possible game and choose the best action for the current turn.
If it seems like generating all possible games would take a long time, you are correct, it's an exponential complexity algorithm. From here you would want to investigate alpha-beta pruning, which essentially allows you to eliminate some of the possible games you are enumerating based on the values you have found so far, and is a fairly simple modification of minimax. This solution will still be optimal. I defer to the wikipedia article for further explanation.
From there, you would want to experiment with different heuristics for eliminating nodes, which could prune the tree of a significant number of nodes to traverse, however do note that eliminating nodes via heuristics will potentially produce a sub-optimal, but still good solution depending on your heuristic. One common tactic is to limit the depth of the search tree, essentially you search maybe 5 moves ahead to determine the best current move, using an estimate of each player's score at 5 moves ahead. Once again, this is a heuristic you could tweak. Something like simply calculating the score of the game as if it ended on that turn might suffice, and is definitely a good starting point.
Finally, for the nodes where probability is concerned, there is a slight modification of Minimax called Expectiminimax that essentially takes care of probability by adding a "third" player that chooses the random choice for you. The nodes for this third player take the expected value of the random event as their value.
The usual approach to any such problem is to play the game with live opponent long enough to find some heuristic solutions (short term goals) that lead you to victory. Then you implement these heuristics in your solution. Start with really small boards (1x3) and small number of categories (1), play them and see what happens, and then advance to more complicated cases.
Without playing the game I can only imagine that categories with less items are more valuable, also categories with items currently closer to you, and categories with items that are farthest away from you but still closer to you than to the opponent.
Every category has a cost, which is number of moves required to gain control of it, but the cost for you is different from the cost for the opponent, and it changes with every move. Category has greater value to you if the cost for you is near the cost for the opponent, but is still less than opponent's cost.
Every time you make a move categories change their values, so you have to recalculate the board and go from there in deciding your next move. The goal is to maximize your values and minimize opponents values, assuming that opponent uses the same algorithm as you.
The search for best move gets more complicated if you explore more than one turn in advance, but is also more effective. In this case you have to simulate opponents moves using the same algorithm, and then choosing your move to which opponent has the weakest counter-move. This strategy is called minimax.
All this is not really an AI, but it is an road map for an algorithm. Neural networks mentioned in the other answer are more AI-like, but I don't know anything about them.
The goal of the AI is to always seek to maintain the win conditions.
If it is practical (depending on how item locations are stored), at the start of each turn, the distance to all remaining items should be known to the AI. Ideally, this would be calculated once when the game is started, then simply "adjusted" based on where the AI moves, instead of recalculating at each turn. It also wouldn't be wise to have the AI do the same thing for the player if the AI isn't going to be only considering it's own situation.
From there is a matter of determining what item should be picked up as an optimization of the following considerations:
What items and item categories does the AI currently have?
What items and item categories does the player currently have?
What items and item categories are near the AI?
What items and item categories are near the Player?
Exactly how you do this largely depends on how difficult to beat you want the AI to be.
A simple way would be to use a greedy approach and simply go after the "current" best choice. This could be done by simply finding the closest item that is not in a category that the player is currently winning by so many items (probably 1-3). This produces an AI that tries to win, but doesn't think ahead making it rather easy to predict.
Allowing for the greedy algorithm to check multiple turns ahead will improve the algorithm, that and considering what the player will do will improve the algorithm further.
Heuristics will lead to a more realistic AI and hard to beat AI. Possibly even practically impossible to beat.

Solve crossword puzzle with genetic algorithm, fitness, mutation

I'm trying hard to do a lab for school. I'm trying to solve a crossword puzzle using genetic algorithms.
Problem is that is not very good (it is still too random)
I will try to give a brief explanation of how my program is implemented now:
If i have the puzzle (# is block, 0 is empty space)
#000
00#0
#000
and a collection of words that are candidates to this puzzle's solution.
My DNA is simply the matrix as a 1D array.
My first set of individuals have random generated DNAs from the pool of letters that my words contains.
I do selection using roulette-selection.
There are some parameters about the chance of combination and mutations, but if mutation will happen then i will always change 25% of the DNA.
I change it with random letters from my pool of letters.(this can have negative effects, as the mutations can destroy already formed words)
Now the fitness function:
I traverse the matrix both horizontaly and verticaly:
If i find a word then FITNESS += word.lengh +1
If i find a string that is a part of some word then FITNESS += word.length / (puzzle_size*4) . Anyway it should give a value between 0 and 1.
So it can find "to" from "tool" and ads X to FITNESS, then right after it finds "too" from "tool" and adds another Y to FITNESS.
My generations are not actually improving over time. They appear random.
So even after 400 generations with a pool of 1000-2000 (these numbers dont really matter) i get a solution with 1-2 words (of 2 or 3 letters) when the solution should have 6 words.
I think your fitness function might be ill-defined. I would set this up so each row has a binary fitness level. Either a row is fit, or it is not. (eg a Row is a word or it is not a word) Then the overall fitness of the solution would be #fit rows / total rows (both horizontally and vertically). Also, you might be changing too much of the dna, I would make that variable and experiment with that.
Your fitness function looks OK to me, although without more detail it's hard to get a really good picture of what you're doing.
You don't specify the mutation probability, but when you do mutate, 25% is a very high mutation. Also, roulette wheel selection applies a lot of selection pressure. What you often see is that the algorithm pretty early on finds a solution that is quite a bit better than all the others, and roulette wheel selection causes the algorithm to select it with such high probability that you quickly end up with a population full of copies of that. At that point, search halts except for the occasional blindly lucky mutation, and since your mutations are so large, it's very unlikely that you'll find an improving move without wrecking the rest of the chromosome.
I'd try binary tournament selection, and a more sensible mutation operator. The usual heuristic people use for mutation is to (on average) flip one "bit" of each chromosome. You don't want a deterministic one letter change each time though. Something like this:
for(i=0; i<chromosome.length(); ++i) {
// random generates double in the range [0, 1)
if(random() < 1.0/chromosome.length()) {
chromosome[i] = pick_random_letter();
}
}

Grouping individuals into families

We have a simulation program where we take a very large population of individual people and group them into families. Each family is then run through the simulation.
I am in charge of grouping the individuals into families, and I think it is a really cool problem.
Right now, my technique is pretty naive/simple. Each individual record has some characteristics, including married/single, age, gender, and income level. For married people I select an individual and loop through the population and look for a match based on a match function. For people/couples with children I essentially do the same thing, looking for a random number of children (selected according to an empirical distribution) and then loop through all of the children and pick them out and add them to the family based on a match function. After this, not everybody is matched, so I relax the restrictions in my match function and loop through again. I keep doing this, but I stop before my match function gets too ridiculous (marries 85-year-olds to 20-year-olds for example). Anyone who is leftover is written out as a single person.
This works well enough for our current purposes, and I'll probably never get time or permission to rework it, but I at least want to plan for the occasion or learn some cool stuff - even if I never use it. Also, I'm afraid the algorithm will not work very well for smaller sample sizes. Does anybody know what type of algorithms I can study that might relate to this problem or how I might go about formalizing it?
For reference, I'm comfortable with chapters 1-26 of CLRS, but I haven't really touched NP-Completeness or Approximation Algorithms. Not that you shouldn't bring up those topics, but if you do, maybe go easy on me because I probably won't understand everything you are talking about right away. :) I also don't really know anything about evolutionary algorithms.
Edit: I am specifically looking to improve the following:
Less ridiculous marriages.
Less single people at the end.
Perhaps what you are looking for is cluster analysis?
Lets try to think of your problem like this (starting by solving the spouses matching):
If you were to have a matrix where each row is a male and each column is a female, and every cell in that matrix is the match function's returned value, what you are now looking for is selecting cells so that there won't be a row or a column in which more than one cell is selected, and the total sum of all selected cells should be maximal. This is very similar to the N Queens Problem, with the modification that each allocation of a "queen" has a reward (which we should maximize).
You could solve this problem by using a graph where:
You have a root,
each of the first raw's cells' values is an edge's weight leading to first depth vertices
each of the second raw's cells' values is an edge's weight leading to second depth vertices..
Etc.
(Notice that when you find a match to the first female, you shouldn't consider her anymore, and so for every other female you find a match to)
Then finding the maximum allocation can be done by BFS, or better still by A* (notice A* typically looks for minimum cost, so you'll have to modify it a bit).
For matching between couples (or singles, more on that later..) and children, I think KNN with some modifications is your best bet, but you'll need to optimize it to your needs. But now I have to relate to your edit..
How do you measure your algorithm's efficiency?
You need a function that receives the expected distribution of all states (single, married with one children, single with two children, etc.), and the distribution of all states in your solution, and grades the solution accordingly. How do you calculate the expected distribution? That's quite a bit of statistics work..
First you need to know the distribution of all states (single, married.. as mentioned above) in the population,
then you need to know the distribution of ages and genders in the population,
and last thing you need to know - the distribution of ages and genders in your population.
Only then, according to those three, can you calculate how many people you expect to be in each state.. And then you can measure the distance between what you expected and what you got... That is a lot of typing.. Sorry for the general parts...

Logic / Probability Question: Picking from a bag

I'm coding a board game where there is a bag of possible pieces. Each turn, players remove randomly selected pieces from the bag according to certain rules.
For my implementation, it may be easier to divide up the bag initially into pools for one or more players. These pools would be randomly selected, but now different players would be picking from different bags. Is this any different?
If one player's bag ran out, more would be randomly shuffled into it from the general stockpile.
So long as:
the partition into "pool" bags is random
the assignment of players to a given pool bag is random
the game is such that items drawn by the players are effectively removed from the bag (never returned to the bag,or any other bag, for the duration of the current game)
the players are not cognizant of the content of any of the bags
The two approaches ("original" with one big common bag, "modified" with one pool bag per player are equivalent with regards to probabilities.
It only gets a bit tricky towards the end of the game, when some of players' bags are empty. The fairest to let pick from 100% of the items still in play, hence, they should both pick from which bag they pick and [blindly,of course] pick one item from said bag.
This problem illustrate an interesting characteristic of probabilities which is that probabilities are relative to the amount of knowledge one has about the situation. For example the game host may well know that the "pool" bag assigned to say player X does not include any say Letter "A" (thinking about scrabble), but so long as none of the players know this (and so long as the partitions into pool bag was fully random), the game remains fair, and player "X" still has to assume that his/her probably of hitting an "A" the next time a letter is drawn, is the same as if all remaining letters were available to him/her.
Edit:
Not withstanding the mathematical validity of the assertion that both procedures are fully equivalent, perception is an important factor in games that include a chance component (in particular if the game also includes a pecuniary component). To avoid the ire of players who do not understand this equity, you may stick to the original procedure...
Depending on the game rules, #mjv is right, the initial random division doesn't affect the probabilities. This is analogous to a game where n players draw cards in turn from a face down deck: the initial shuffle of the deck is the random division into the "bags" of cards for each player.
But if you replace the items after each draw, it does matter if there is one bag or many. With one bag any particular item will eventually be drawn by any player with the same probability. With many bags, that item can only be drawn by the player whose bag it was initially placed in.
Popping up to the software level, if the game calls for a single bag, I'd recommend just programming it that way: it should be no more difficult than n bags, and you don't have to prove the new game equivalent to the old.
My intuition tells me that dividing a random collection of things into smaller random subsets would remains equally random... doesn't matter if a player picks from a big pool or a smaller one (that in turns, feed itself into the big one)
For a game it is enough random IMHO!
Depending on how crucial security is, it might be okay (if money is involved (you or them) DO NOT DO THAT). I'm not entirely sure it would be less random from the perspective of an ignorant player.
a) Don't count on them being ignorant, your program could be cracked and then they would know what pieces are coming up
b) It would be very tricky to fill the bags in such a way that you don't introduce vulnerabilities. For instance, let's take the naive algorithm of picking one randomly and putting it in a the first bucket, taking it out, and then doing the same for the second bucket and so on. You just ensured that if there are N pieces, the first player had a probability of 1/N of picking a given piece, the second player had a 1/(N-1), the third had 1/(N-3) and so on. Players can then analyze the pieces already played in order to figure out the probabilities that other players are holding certain pieces.
I THINK the following algorithm might work better, but almost all people get probability wrong the first time they come up with a new algorithm. DON'T USE THIS, just understand that it might cover the security vulnerability I talked about:
Create a list of N ordered items and instantiate P players
Mark 1/P of the items randomly (with replacement) for each player
Do this repeatedly until all N items are marked and there are an equal
number of items marked for each player (NOTE: May take much longer than you may live depending on N and P)
Place the appropriate items in the player's bucket and randomly rearrange (do NOT use a place swapping algorithm)
Even then after all this, you might still have a vulnerability to someone figuring out what's in their bucket from an exploit. Stick with a combined pool, it's still tricky to pick really randomly, but it will make your life easier.
Edit: I know the tone sounds kind of jerky. I mostly included all that bold for people who might read this out of context and try some of these algorithms. I really wish you well :-)
Edit 2: On further consideration, I think that the problem with picking in order might reduce to having players taking turns in the first place. If that's in the rules already it might not matter.

Resources