Selecting a surviving population in a "voter" Genetic Algorithm - algorithm

I've been working on a genetic algorithm where there is a population consisting of individuals with a color, and a preference. Preference and color are from a small number of finite states, probably around 4 or 5. (example: 1|1, 5|2, 3|3 etc)
Every individual casts a "vote" for their preference, which assists those individuals with that vote as their color.
My current idea is to cycle through every individual, and calculate the chance that they should survive, based on number of votes, etc. and then roll a die to see if they live.
I'm currently doing it so that if v[x] represents the percent of votes for color x, individual k with color c has v[c] chance of surviving. However, this means that if there are equal numbers of all 5 types of (a|a) individuals, 4/5 of them perish, and that's not good.
Does anyone have any idea of a method of randomness I could use to determine the chance an individual has to survive? For instance, an algorithm that for v votes for c, v individuals with color c survive (on statistical average).

Assign your fitness (likelyness of survival in your case) to each individual as is, then sort them on descending fitness and use binary tournament selection or something similar to sample another population of your chosen size.

Well, you can weight the probabilities according to the value returned by passing each
member of the population to the cost function.
That seems to me the most straightforward way, consistent with the genetic
meta-heuristic.
More common though, is to divide the current population into segments, based on
the value returned from passing them to the cost function.
So for instance,
if each generation consists of 100 members, then the top N (N is just a user-defined
parameter, often something like 5-10% of the total) members w/ the lowest cost
function result) are carried forward to the next generation just as they are (elitism).
Perhaps this is what you mean by 'survive.' If so, then again, these 'survivors'
are determined by ranking the members of the population according to the cost function
value and selecting those members above your defined elitism fraction constant.
The rest (the majority) of the next generation are created either by
mutation or cross-over.
mutation:
# one member of the current population:
[4, 5, 1, 7, 4, 2, 8, 9]
# small random change in one member of prior generation, to create mutant that is
# a member of the next generation
[4, 9, 1, 7, 4, 2, 8, 9]
crossover:
# two of the 'top' members of the current generation
[4, 5, 1, 7, 4, 2, 8, 9]
[2, 3, 6, 9, 2, 1, 6, 4]
# offpsring is a member of the next generation
[4, 5, 1, 7, 2, 1, 6, 4]

Related

Get kth group of unsorted result list with arbitrary number of results per group

Okay so I have a huge array of unsorted elements of an unknown data type (all elements are of the same type, obviously, I just can't make assumptions as they could be numbers, strings, or any type of object that overloads the < and > operators. The only assumption I can make about those objects is that no two of them are the same, and comparing them (A < B) should give me which one should show up first if it was sorted. The "smallest" should be first.
I receive this unsorted array (type std::vector, but honestly it's more of an algorithm question so no language in particular is expected), a number of objects per "group" (groupSize), and the group number that the sender wants (groupNumber).
I'm supposed to return an array containing groupSize elements, or less if the group requested is the last one. (Examples: 17 results with groupSize of 5 would only return two of them if you ask for the fourth group. Also, the fourth group is group number 3 because it's a zero-indexed array)
Example:
Received Array: {1, 5, 8, 2, 19, -1, 6, 6.5, -14, 20}
Received pageSize: 3
Received pageNumber: 2
If the array was sorted, it would be: {-14, -1, 1, 2, 5, 6, 6.5, 8, 19, 20}
If it was split in groups of size 3: {{-14, -1, 1}, {2, 5, 6}, {6.5, 8, 19}, {20}}
I have to return the third group (pageNumber 2 in a 0-indexed array): {6.5, 8, 19}
The biggest problem is the fact that it needs to be lightning fast. I can't sort the array because it has to be faster than O(n log n).
I've tried several methods, but can never get under O(n log n).
I'm aware that I should be looking for a solution that doesn't fill up all the other groups, and skips a pretty big part of the steps shown in the example above, to create only the requested group before returning it, but I can't figure out a way to do that.
You can find the value of the smallest element s in the group in linear time using the standard C++ std::nth_element function (because you know it's index in the sorted array). You can find the largest element S in the group in the same way. After that, you need a linear pass to find all elements x such that s <= x <= S and return them. The total time complexity is O(n).
Note: this answer is not C++ specific. You just need an implementation of the k-th order statistics in linear time.

Data structure for quick match in a card game

When playing trading card games, I frequently wonder what would be the most efficient data structure to deal with the following problem.
In such games, I face an opponent with a deck that contains N cards (N ~ 30..60..100), each of them is chosen out of possible M card types (M ~ typically 1000..10000s). Cards are generally not required to be unique, i.e. there can be repeated card types. The contents of opponent's deck are unknown before the game.
As the game starts and progresses, I slowly learn card-by-card, which cards an opponent uses. There is a dataset that includes full contents of K (K ~ typically 100000..100000s) of the decks seen previously. I want to query this dataset using progressively increasing sample I've obtained in a certain game to make a ranked list of possible decks an opponent uses.
What would be the most efficient data structure to do such querying, given mentioned limits on reasonably modern hardware (i.e. several gigabytes of RAM available)?
A very small example
possible card types = [1..10]
known K decks:
d1 = [1, 4, 6, 3, 4]
d2 = [5, 3, 3, 9, 5]
d3 = [5, 10, 4, 10, 1]
d4 = [3, 7, 1, 8, 5]
on turn 1, I reveal that an opponent uses card #5; thus, my list of candidates is reduced to:
d2 = [5, 3, 3, 9, 5] - score 2
d3 = [5, 10, 4, 10, 1] - score 1
d4 = [3, 7, 1, 8, 5] - score 1
d2 is ranked higher than the rest in the results, because there are double 5s in that deck, so it's probably more likely that it is
on turn 2, I reveal that an opponent uses card #1; list of candidates is reduced to:
d3 = [5, 10, 4, 10, 1]
d4 = [3, 7, 1, 8, 5]
My ideas on solution
The trivial solution is, of course, to store K decks as an arrays of N integers. Getting match score for a given query of p cards revealed for one deck thus takes O(N*p) checks. Each time we see a match, we just increase the score by 1. Thereby, checking all K known decks against a query of p cards would take O(KNp), that is roughly 100000 * 100 * 100 operations in worst case => 1e9, that's lots of work.
We can set up an index that will hold a list of pointers to decks that card is encountered in for every known card type — however, it doesn't solve the problem of intersecting all these lists (and they are going to be huge, there might be cards that are found in 90..95% of known decks). For a given p card lookup, it boils down to intersecting p lists of K decks pointers, calculating intersection scores in process. Roughly, that is O(Kp), but with a fairly large constant. It's still 1e7 operations in late stages.
However, if we'll use the fact that every next turn in fact restricts our dataset further, we can reapply filtering to whatever came up on previous query. This way, it would be O(K) every turn => 1e5 operations.
Is there a way to perform better, ideally, not depending on value of K?
There are two things you can do to speed this up. First, create an inverted index that tells you which decks contain each card. So in your example decks above:
d1 = [1, 4, 6, 3, 4]
d2 = [5, 3, 3, 9, 5]
d3 = [5, 10, 4, 10, 1]
d4 = [3, 7, 1, 8, 5]
Your index is:
1: d1, d3, d4
3: d1, d2, d4
4: d1(2), d3
5: d2(2), d3, d4
6: d1
7: d4
8: d4
9: d2
10: d3(2)
It should be clear that this takes the about the same amount of memory as the decks themselves. That is, rather than having N decks of K cards, you have up to M cards, each of which has up to N deck references.
When the user turns over the first card, 5, you quickly look up 5 in your index and you get the candidate lists [d2,d3,d4].
Here's the second optimization: you keep that list of candidates around. You're no longer interested in the rest of the decks; they have been eliminated from the list of candidates. When the next card, 1, is revealed, you look up 1 in your index and you get [d1,d3,d4]. You intersect that with the first list of candidates to produce [d3,d4].
In the worst possible case, you'd end up doing N intersections (one per card) of K items each (if the decks are all very similar). But in most cases the number of decks that a card is in will be much smaller than K, so your candidate list length will likely shrink very quickly.
Finally, if you store the deck references as hash maps then the intersection goes very quickly because you only have to look for items from the (usually small) existing candidate list in the large list of items for the next card turned over. Those lookups are O(1).
This is the basic idea of how a search engine works. You have a list of words, each of which contains references to the documents the word appears in. You can very quickly narrow a list of documents from hundreds of millions to just a handful in short order.
Your idea with intersecting p lists of deck pointers is good, but you're missing some optimizations.
Sort the decks by some criteria (i.e. deck index) and use binary search to advance through the lists (take the smallest deck id using a heap and advance it to match or exceed to current largest deck id). This way you get through them faster, especially if you don't have a lot of decks in the intersection.
Also store the previous intersection so that for the next move you only need to intersect 2 lists (previous result and the new card).
Finally you can simply ignore cards that are too popular and just check for them in the final result.
I would suggest you implement a solution like this and run some benchmarks. It will be faster than O(K).

Importance of order of the operation in backtracking algorithms

Order of operation in each recursive step of a backtracking algorithms are how much important in terms of the efficiency of that particular algorithm?
For Ex.
In the Knight’s Tour problem.
The knight is placed on the first block of an empty board and, moving
according to the rules of chess, must visit each square exactly once.
In each step there are 8 possible (in general) ways to move.
int xMove[8] = { 2, 1, -1, -2, -2, -1, 1, 2 };
int yMove[8] = { 1, 2, 2, 1, -1, -2, -2, -1 };
If I change this order like...
int xmove[8] = { -2, -2, 2, 2, -1, -1, 1, 1};
int ymove[8] = { -1, 1,-1, 1, -2, 2, -2, 2};
Now,
for a n*n board
upto n=6
both the operation order does not affect any visible change in the execution time,
But if it is n >= 7
First operation (movement) order's execution time is much less than the later one.
In such cases, it is not feasible to generate all the O(m!) operation order and test the algorithm. So how do I determine the performance of such algorithms on a specific movement order, or rather how could it be possible to reach one (or a set) of operation orders such that the algorithm that is more efficient in terms of execution time.
This is an interesting problem from a Math/CS perspective. There definitely exists a permutation (or set of permutations) that would be most efficient for a given n . I don't know if there is a permutation that is most efficient among all n. I would guess not. There could be a permutation that is better 'on average' (however you define that) across all n.
If I was tasked to find an efficient permutation I might try doing the following: I would generate a fixed number x of randomly generated move orders. Measure their efficiency. For every one of the randomly generated movesets, randomly create a fixed number of permutations that are near the original. Compute their efficiencies. Now you have many more permutations than you started with. Take top x performing ones and repeat. This will provide some locally maxed algorithms, but I don't know if it leads up to the globally maxed algorithm(s).

Sum of cells with several possibilities

I'm programming a Killer Sudoku Solver in Ruby and I try to take human strategies and put them into code. I have implemented about 10 strategies but I have a problem on this one.
In killer sudoku, we have "zones" of cells and we know the sum of these cells and we know possibilities for each cell.
Example :
Cell 1 can be 1, 3, 4 or 9
Cell 2 can be 2, 4 or 5
Cell 3 can be 3, 4 or 9
The sum of all cells must be 12
I want my program to try all possibilities to eliminate possibilities. For instance, here, cell 1 can't be 9 because you can't make 3 by adding two numbers possible in cells 2 and 3.
So I want that for any number of cells, it removes the ones that are impossible by trying them and seeing it doesn't work.
How can I get this working ?
There's multiple ways to approach the general problem of game solving, and emulating human strategies is not always the best way. That said, here's how you can solve your question:
1st way, brute-forcy
Basically, we want to try all possibilities of the combinations of the cells, and pick the ones that have the correct sum.
cell_1 = [1,3,4,9]
cell_2 = [2,4,5]
cell_3 = [3,4,9]
all_valid_combinations = cell_1.product(cell_2,cell_3).select {|combo| combo.sum == 12}
# => [[1, 2, 9], [3, 5, 4], [4, 4, 4], [4, 5, 3]]
#.sum isn't a built-in function, it's just used here for convenience
to pare this down to individual cells, you could do:
cell_1 = all_valid_combinations.map {|combo| combo[0]}.uniq
# => [1, 3, 4]
cell_2 = all_valid_combinations.map {|combo| combo[1]}.uniq
# => [2, 5, 4]
. . .
if you don't have a huge large set of cells, this way is easier to code. it can get a bit inefficienct though. For small problems, this is the way I'd use.
2nd way, backtracking search
Another well known technique takes the problem from the other approach. Basically, for each cell, ask "Can this cell be this number, given the other cells?"
so, starting with cell 1, can the number be 1? to check, we see if cells 2 and 3 can sum to 11. (12-1)
* can cell 2 have the value 2? to check, can cell 3 sum to 9 (11-1)
and so on. In very large cases, where you could have many many valid combinations, this will be slightly faster, as you can return 'true' on the first time you find a valid number for a cell. Some people find recursive algorithms a bit harder to grok, though, so your mileage may vary.

How to determine best combinations from 2 lists

I'm looking for a way to make the best possible combination of people in groups. Let me sketch the situation.
Say we have persons A, B, C and D. Furthermore we have groups 1, 2, 3, 4 and 5. Both are examples and can be less or more. Each person gives a rating to each other person. So for example A rates B a 3, C a 2, and so on. Each person also rates each group. (Say ratings are 0-5). Now I need some sort of algorithm to distribute these people evenly over the groups while keeping them as happy as possible (as in: They should be in a highrated group, with highrated people). Now I know it's not possible for the people to be in the best group (the one they rated a 5) but I need them to be in the best possible solution for the entire group.
I think this is a difficult question, and I would be happy if someone could direct me to some more information about this types of problems, or help me with the algo I'm looking for.
Thanks!
EDIT:
I see a lot of great answers but this problem is too great for me too solve correctly. However, the answers posted so far give me a great starting point too look further into the subject. Thanks a lot already!
after establishing this is NP-Hard problem, I would suggest as a heuristical solution: Artificial Intelligence tools.
A possible approach is steepest ascent hill climbing [SAHC]
first, we will define our utility function (let it be u). It can be the sum of total happiness in all groups.
next,we define our 'world': S is the group of all possible partitions.
for each legal partition s of S, we define:
next(s)={all possibilities moving one person to a different group}
all we have to do now is run SAHC with random restarts:
1. best<- -INFINITY
2. while there is more time
3. choose a random partition as starting point, denote it as s.
4. NEXT <- next(s)
5. if max{ U(NEXT) } < u(s): //s is the top of the hill
5.1. if u(s) > best: best <- u(s) //if s is better then the previous result - store it.
5.2. go to 2. //restart the hill climbing from a different random point.
6. else:
6.1. s <- max{ NEXT }
6.2. goto 4.
7. return best //when out of time, return the best solution found so far.
It is anytime algorithm, meaning it will get a better result as you give it more time to run, and eventually [at time infinity] it will find the optimal result.
The problem is NP-hard: you can reduce from Maximum Triangle Packing, that is, finding at least k vertex-disjoint triangles in a graph, to the version where there are k groups of size 3, no one cares about which group he is in, and likes everyone for 0 or for 1. So even this very special case is hard.
To solve it, I would try using an ILP: have binary variables g_ik indicating that person i is in group k, with constraints to ensure a person is only in one group and a group has an appropriate size. Further, binary variables t_ijk that indicate that persons i and j are together in group k (ensured by t_ijk <= 0.5 g_ik + 0.5 g_jk) and binary variables t_ij that indicate that i and j are together in any group (ensured by t_ij <= sum_k t_ijk). You can then maximize the happiness function under these constraints.
This ILP has very many variables, but modern solvers are pretty good and this approach is very easy to implement.
This is an example of an optimization problem. It is a very well
studied type of problems with very good methods to solve them. Read
Programming Collective Intelligence which explains it much better
than me.
Basically, there are three parts to any kind of optimization problem.
The input to the problem solving function.
The solution outputted by the problem solving function.
A scoring function that evaluates how optimal the solution is by
scoring it.
Now the problem can be stated as finding the solution that produces
the highest score. To do that, you first need to come up with a format
to represent a possible solution that the scoring function can then
score. Assuming 6 persons (0-5) and 3 groups (0-2), this python data structure
would work and would be a possible solution:
output = [
[0, 1],
[2, 3],
[4, 5]
]
Person 0 and 1 is put in group 0, person 2 and 3 in group 1 and so
on. To score this solution, we need to know the input and the rules for
calculating the output. The input could be represented by this data
structure:
input = [
[0, 4, 1, 3, 4, 1, 3, 1, 3],
[5, 0, 1, 2, 1, 5, 5, 2, 4],
[4, 1, 0, 1, 3, 2, 1, 1, 1],
[2, 4, 1, 0, 5, 4, 2, 3, 4],
[5, 5, 5, 5, 0, 5, 5, 5, 5],
[1, 2, 1, 4, 3, 0, 4, 5, 1]
]
Each list in the list represents the rating the person gave. For
example, in the first row, the person 0 gave rating 0 to person 0 (you
can't rate yourself), 4 to person 1, 1 to person 2, 3 to 3, 4 to 4 and
1 to person 5. Then he or she rated the groups 0-2 3, 1 and 3
respectively.
So above is an example of a valid solution to the given input. How do
we score it? That's not specified in the question, only that the
"best" combination is desired therefore I'll arbitrarily decide that
the score for a solution is the sum of each persons happiness. Each
persons happiness is determined by adding his or her rating of the
group with the average of the rating for each person in the group,
excluding the person itself.
Here is the scoring function:
N_GROUPS = 3
N_PERSONS = 6
def score_solution(input, output):
tot_score = 0
for person, ratings in enumerate(input):
# Check what group the person is a member of.
for group, members in enumerate(output):
if person in members:
# Check what rating person gave the group.
group_rating = ratings[N_PERSONS + group]
# Check what rating the person gave the others.
others = list(members)
others.remove(person)
if not others:
# protect against zero division
person_rating = 0
else:
person_ratings = [ratings[o] for o in others]
person_rating = sum(person_ratings) / float(len(person_ratings))
tot_score += group_rating + person_rating
return tot_score
It should return a score of 37.0 for the given solution. Now what
we'll do is to generate valid outputs while keeping track of which one
is best until we are satisfied:
from random import choice
def gen_solution():
groups = [[] for x in range(N_GROUPS)]
for person in range(N_PERSONS):
choice(groups).append(person)
return groups
# Generate 10000 solutions
solutions = [gen_solution() for x in range(10000)]
# Score them
solutions = [(score_solution(input, sol), sol) for sol in solutions]
# Sort by score, take the best.
best_score, best_solution = sorted(solutions)[-1]
print 'The best solution is %s with score %.2f' % (best_solution, best_score)
Running this on my computer produces:
The best solution is [[0, 1], [3, 5], [2, 4]] with score 47.00
Obviously, you may think it is a really stupid idea to randomly just
generate solutions to throw at the problem, and it is. There are much
more sophisticated methods to generate solutions such as simulated
annealing or genetic optimization. But they all build upon the same
framework as given above.

Resources