Best algorithm for optimizing the decisions in a simulation - algorithm

I'm looking for the best algorithm to optimise the decisions made in a simultaion to find a fast result in a reasonable amount of time. The simultaion does a number of "ticks" and occasionaly needs to make a decision. Eventually a goal state is reached. ( It would be possible to never reach a goal state if you make very bad decisions )
There are many many goal states. I want to find the goal state with the least number of ticks ( a tick equates roughly to a second in real life." I basically want to decide which decisions to make to get to the goal in as few seconds as possible,
Some points about the problem domain:
Straight off the bat I can generate a series of choices that will lead to a solution. It won't be optimal.
I have a reasonable heuristic function to determine what would be a good decision
I have a reasonable function to determine the minimum possible time cost from a node to a goal.
Algorithms:
I need to process this problem for about 10 seconds and then give the best answer I can.
I believe A* would find me the optimal soluton. The problem is that the decision tree will be so large that I won't be able to calculate it quick enough.
IDA* would give me a good first few choices in 10 seconds but I need a path all the way to a goal.
At the moment I am thinking that I will start off with the known non optimal path to a goal and then perhaps use Simulated Anealing and attempt to improve it over 10 seconds.
What would be a good algorithm to research to try to solve this sort of problem?

Have a look at limited discrepancy search, repeating with increasingly loose limits on the maximum discrepancy search, or beam search.
If you have a good heuristic you should be able to use it to compare individual choices - for the limited discrepancy search, and compare partial solutions, for the beam search.
See if you can place an upper bound on how good any extension of a partial solution is. Then you can prune out partial solutions that can't possibly be extended to beat the result from the heuristic, or the best result found so far in a series of iterative searches with increasing depth.

Let's get a few facts out.
1) The only way to know for sure which decision is the best is to test every possible decision and evaluate the outcome based on some criteria.
2) We are highly unlikely to have the time to decide to go through every possible decision, so we have to limit how far in the future we will evaluate the decision.
3) We are highly unlikely to make the best move ~ever~. Not just often, but ever. Unless you have only a couple of decisions, chances are every time you make a decision, there was a better one you didn't get to.
4) We can use how our previous decisions worked out to our advantage.
Put all this together... Let's say when we have a decision, we evaluate what happens 30 ticks into the future, in 30 ticks we can check to see if what actually happened matches what we simulated 30 ticks ago. If it was, we know that decision leads to predictable outcomes and we should use that decision less. If we didn't, or if it turns out better than we hoped, we should use that decision more.
Ideally, you would use your logic in a ... simulation of your simulation ... for purposes of evaluating it. Then when you get to the 'real' simulation, you have a better chance at picking your better decisions earlier. Of course, give a higher weight to the results of your actual simulation results as opposed to your simulated simulation results.

Related

Predicting remaining runtime for minimax algorithm with alpha-beta-pruning

Problem
I am trying to solve a perfekt information zero-sum game (like tick-tack-toe or chess) using a negamax algorithm with alpha-beta-pruning. The goal is to proof wheter one player can force a win or draw. This means that there is no depth-limit but the algorithm always evaluates the gametree until there is a win/draw.
I spent multiple weeks optimizing my code to my specific game and got it down to a runtime of several days I would say. But there lies the problem:
Because of the alpha-beta-pruning the runtime of the minimax-algorithm is highly unpredictable. I can't tell wheter it will be done in the next 5 minutes or run for 5 more weeks until I actually simulated it. I would love to be able to predict the remaining runtime and not be off by several orders of magnitude.
What I tried so far
I am recording the results of all sub- and subsub-branches up to 5*sub-branches and the time it took my machine to simulate them. Then I just assume that positions on the same level take the same time to evaluate and call it a day. These predictions are sometimes off by a factor of 10 or more.
I also looked at recorded data to see wheter my assumtion holds. The time needed to evaluate a 5*sub-branch varied between 0.01s to as much as 180s. Thats why my predictions where off. Who would have gessed.
My Question
As I imagine this would apply to all implementations of minimax:
Are there more sophisticated algorithm out there to accuratly predict the remaining runtime of a minimax-algorithm with alpha-beta-pruning? Or is minimax just unpredictable by design?
If so how do they work?
I have spent a lot of time with Negamax algorithms which I highly suggest that you switch over to. It will give the same results as Minimax, but is much easier to debug and optimize further since it is just half the code.
I have no clue about the game you are trying to solve, but if it is even the slightliest complicated I assume it won't be possible without a super computer. To answer your questions though:
Minimax with alpha-beta pruning relies highly on the order of which you try your moves (to use board game terms). You want to try the best moves first, this is done in chess by ordering the possible moves function with e.g. capture moves higher up than castling.
You can also optimize the algorithm much much more with different techniques depending on what you are trying to solve. For example transposition tables if the same position can occur in another branch.
We need to know more about the game you are trying to solve to know what algorithm can work best.
Final words: If you want to get an idea of how long it will take to solve and how far you have gotten after some time, I suggest you use iterative deepending. This will also speed up your search, since you can try the best guesses from the previous iterations first and hence get faster beta cut offs in the next iteration:
for depth in range(1, inf):
score = minimax(alpha, beta, depth....)
time = elapsed_time()
Now you can print the elapsed time for each depth and see how far it gets in a certain period of time. This is also good to measuer if your optimizations are giving any results. Since the Minimax tree is getting exponentially larger for each depth you can get an idea on how much time the next depth will take you.
So if you know around how many moves it will take for a win/draw/loss you can pretty easily estimate whether it will be possible or not through this technique.
Hope I make myself clear, English is not my native language :) Feel free ask in the comments if something is not clear.

Elite\Elitist model in a Genetic Algorithm

When is the right time to use the Elite\Elitist mode in a Genetic Algorithm? I have no idea when to use it. What kind problems can be solved using this?
All I know is an elitist model is where you choose the elite (the solution with highest fitness function) and they have a reserve slot for the next generation, and they are the one up for crossover.
You pretty much always use some form of elitism. What varies is the percentage (p) of best performers that you allow to survive to the next generation. So no elitism is basically saying p=0.
The higher p, the more your algorithm will have a tendency to find local peaks of fitness. i.e. once it finds a chromosome with a good fitness, it'll tend to focus more on optimizing it than trying to find new completely different solutions. On the contrary, if it's smaller, your GA will look for possible solutions all over the place and won't zero in as fast once it finds something close to the optimum solution.
So setting p correctly is going to have a direct impact on your algorithm's performance. But it depends on what you're after and your problem space. Play around with it a bit to adjust properly. I typically use 20% for the problems I work with, to give enough room for innovation. It works ok for me.

Simulation Performance Metrics

This is a semi-broad question, but it's one that I feel on some level is answerable or at least approachable.
I've spent the last month or so making a fairly extensive simulation. In order to protect the interests of my employer, I won't state specifically what it does... but an analogy of what it does may be explained by... a high school dance.
A girl or boy enters the dance floor, and based on the selection of free dance partners, an optimal choice is made. After a period of time, two dancers finish dancing and are now free for a new partnership.
I've been making partner selection algorithms designed to maximize average match outcome while not sacrificing wait time for a partner too much.
I want a way to gauge / compare versions of my algorithms in order to make a selection of the optimal algorithm for any situation. This is difficult however since the inputs of my simulation are extremely large matrices of input parameters (2-5 per dancer), and the simulation takes several minutes to run (a fact that makes it difficult to test a large number of simulation inputs). I have a few output metrics, but linking them to the large number of inputs is extremely hard. I'm also interested in finding which algorithms completely fail under certain input conditions...
Any pro tips / online resources which might help me in defining input constraints / output variables which might give clarity on an optimal algorithm?
I might not understand what you exactly want. But here is my suggestion. Let me know if my solution is inaccurate/irrelevant and I will edit/delete accordingly.
Assume you have a certain metric (say compatibility of the pairs or waiting time). If you just have the average or total number for this metric over all the users, it is kind of useless. Instead you might want to find the distribution of of this metric over all users. If nothing, you should always keep track of the variance. Once you have the distribution, you can calculate a probability that particular algorithm A is better than B for a certain metric.
If you do not have the distribution of the metric within an experiment, you can always run multiple experiments, and the number of experiments you need to run depends on the variance of the metric and difference between two algorithms.

Multiple parameter optimization with lots of local minima

I'm looking for algorithms to find a "best" set of parameter values. The function in question has a lot of local minima and changes very quickly. To make matters even worse, testing a set of parameters is very slow - on the order of 1 minute - and I can't compute the gradient directly.
Are there any well-known algorithms for this kind of optimization?
I've had moderate success with just trying random values. I'm wondering if I can improve the performance by making the random parameter chooser have a lower chance of picking parameters close to ones that had produced bad results in the past. Is there a name for this approach so that I can search for specific advice?
More info:
Parameters are continuous
There are on the order of 5-10 parameters. Certainly not more than 10.
How many parameters are there -- eg, how many dimensions in the search space? Are they continuous or discrete - eg, real numbers, or integers, or just a few possible values?
Approaches that I've seen used for these kind of problems have a similar overall structure - take a large number of sample points, and adjust them all towards regions that have "good" answers somehow. Since you have a lot of points, their relative differences serve as a makeshift gradient.
Simulated
Annealing: The classic approach. Take a bunch of points, probabalistically move some to a neighbouring point chosen at at random depending on how much better it is.
Particle
Swarm Optimization: Take a "swarm" of particles with velocities in the search space, probabalistically randomly move a particle; if it's an improvement, let the whole swarm know.
Genetic Algorithms: This is a little different. Rather than using the neighbours information like above, you take the best results each time and "cross-breed" them hoping to get the best characteristics of each.
The wikipedia links have pseudocode for the first two; GA methods have so much variety that it's hard to list just one algorithm, but you can follow links from there. Note that there are implementations for all of the above out there that you can use or take as a starting point.
Note that all of these -- and really any approach to this large-dimensional search algorithm - are heuristics, which mean they have parameters which have to be tuned to your particular problem. Which can be tedious.
By the way, the fact that the function evaluation is so expensive can be made to work for you a bit; since all the above methods involve lots of independant function evaluations, that piece of the algorithm can be trivially parallelized with OpenMP or something similar to make use of as many cores as you have on your machine.
Your situation seems to be similar to that of the poster of Software to Tune/Calibrate Properties for Heuristic Algorithms, and I would give you the same advice I gave there: consider a Metropolis-Hastings like approach with multiple walkers and a simulated annealing of the step sizes.
The difficulty in using a Monte Carlo methods in your case is the expensive evaluation of each candidate. How expensive, compared to the time you have at hand? If you need a good answer in a few minutes this isn't going to be fast enough. If you can leave it running over night, it'll work reasonably well.
Given a complicated search space, I'd recommend a random initial distributed. You final answer may simply be the best individual result recorded during the whole run, or the mean position of the walker with the best result.
Don't be put off that I was discussing maximizing there and you want to minimize: the figure of merit can be negated or inverted.
I've tried Simulated Annealing and Particle Swarm Optimization. (As a reminder, I couldn't use gradient descent because the gradient cannot be computed).
I've also tried an algorithm that does the following:
Pick a random point and a random direction
Evaluate the function
Keep moving along the random direction for as long as the result keeps improving, speeding up on every successful iteration.
When the result stops improving, step back and instead attempt to move into an orthogonal direction by the same distance.
This "orthogonal direction" was generated by creating a random orthogonal matrix (adapted this code) with the necessary number of dimensions.
If moving in the orthogonal direction improved the result, the algorithm just continued with that direction. If none of the directions improved the result, the jump distance was halved and a new set of orthogonal directions would be attempted. Eventually the algorithm concluded it must be in a local minimum, remembered it and restarted the whole lot at a new random point.
This approach performed considerably better than Simulated Annealing and Particle Swarm: it required fewer evaluations of the (very slow) function to achieve a result of the same quality.
Of course my implementations of S.A. and P.S.O. could well be flawed - these are tricky algorithms with a lot of room for tweaking parameters. But I just thought I'd mention what ended up working best for me.
I can't really help you with finding an algorithm for your specific problem.
However in regards to the random choosing of parameters I think what you are looking for are genetic algorithms. Genetic algorithms are generally based on choosing some random input, selecting those, which are the best fit (so far) for the problem, and randomly mutating/combining them to generate a next generation for which again the best are selected.
If the function is more or less continous (that is small mutations of good inputs generally won't generate bad inputs (small being a somewhat generic)), this would work reasonably well for your problem.
There is no generalized way to answer your question. There are lots of books/papers on the subject matter, but you'll have to choose your path according to your needs, which are not clearly spoken here.
Some things to know, however - 1min/test is way too much for any algorithm to handle. I guess that in your case, you must really do one of the following:
get 100 computers to cut your parameter testing time to some reasonable time
really try to work out your parameters by hand and mind. There must be some redundancy and at least some sanity check so you can test your case in <1min
for possible result sets, try to figure out some 'operations' that modify it slightly instead of just randomizing it. For example, in TSP some basic operator is lambda, that swaps two nodes and thus creates new route. Your can be shifting some number up/down for some value.
then, find yourself some nice algorithm, your starting point can be somewhere here. The book is invaluable resource for anyone who starts with problem-solving.

What is an efficient way to go beyond a greedy algorithm

The domain of this question is scheduling operations on constrained hardware. The resolution of the result is the number of clock cycles the schedule fits within. The search space grows very rapidly where early decisions constrain future decisions and the total number of possible schedules grows rapidly and exponentially. A lot of the possible schedules are equivalent because just swapping the order of two instructions usually result in the same timing constraint.
Basically the question is what is a good strategy for exploring the vast search space without spending too much time. I expect to search only a small fraction but would like to explore different parts of the search space while doing so.
The current greedy algorithm tend to make stupid decisions early on sometimes and the attempt at branch and bound was beyond slow.
Edit:
Want to point out that the result is very binary with perhaps the greedy algorithm ending up using 8 cycles while there exists a solution using only 7 cycles using branch and bound.
Second point is that there are significant restrictions in data routing between instructions and dependencies between instructions that limits the amount of commonality between solutions. Look at it as a knapsack problem with a lot of ordering constraints as well as some solutions completely failing because of routing congestion.
Clarification:
In each cycle there is a limit to how many operations of each type and some operations have two possible types. There are a set of routing constraints which can be varied to be either fairly tight or pretty forgiving and the limit depends on routing congestion.
Integer linear optimization for NP-hard problems
Depending on your side constraints, you may be able to use the critical path method or
(as suggested in a previous answer) dynamic programming. But many scheduling problems are NP-hard just like the classical traveling sales man --- a precise solution has a worst case of exponential search time, just as you describe in your problem.
It's important to know that while NP-hard problems still have a very bad worst case solution time there is an approach that very often produces exact answers with very short computations (the average case is acceptable and you often don't see the worst case).
This approach is to convert your problem to a linear optimization problem with integer variables. There are free-software packages (such as lp-solve) that can solve such problems efficiently.
The advantage of this approach is that it may give you exact answers to NP-hard problems in acceptable time. I used this approach in a few projects.
As your problem statement does not include more details about the side constraints, I cannot go into more detail how to apply the method.
Edit/addition: Sample implementation
Here are some details about how to implement this method in your case (of course, I make some assumptions that may not apply to your actual problem --- I only know the details form your question):
Let's assume that you have 50 instructions cmd(i) (i=1..50) to be scheduled in 10 or less cycles cycle(t) (t=1..10). We introduce 500 binary variables v(i,t) (i=1..50; t=1..10) which indicate whether instruction cmd(i) is executed at cycle(t) or not. This basic setup gives the following linear constraints:
v_it integer variables
0<=v_it; v_it<=1; # 1000 constraints: i=1..50; t=1..10
sum(v_it: t=1..10)==1 # 50 constraints: i=1..50
Now, we have to specify your side conditions. Let's assume that operations cmd(1)...cmd(5) are multiplication operations and that you have exactly two multipliers --- in any cycle, you may perform at most two of these operations in parallel:
sum(v_it: i=1..5)<=2 # 10 constraints: t=1..10
For each of your resources, you need to add the corresponding constraints.
Also, let's assume that operation cmd(7) depends on operation cmd(2) and needs to be executed after it. To make the equation a little bit more interesting, lets also require a two cycle gap between them:
sum(t*v(2,t): t=1..10) + 3 <= sum(t*v(7,t): t=1..10) # one constraint
Note: sum(t*v(2,t): t=1..10) is the cycle t where v(2,t) is equal to one.
Finally, we want to minimize the number of cycles. This is somewhat tricky because you get quite big numbers in the way that I propose: We give assign each v(i,t) a price that grows exponentially with time: pushing off operations into the future is much more expensive than performing them early:
sum(6^t * v(i,t): i=1..50; t=1..10) --> minimum. # one target function
I choose 6 to be bigger than 5 to ensure that adding one cycle to the system makes it more expensive than squeezing everything into less cycles. A side-effect is that the program will go out of it's way to schedule operations as early as possible. You may avoid this by performing a two-step optimization: First, use this target function to find the minimal number of necessary cycles. Then, ask the same problem again with a different target function --- limiting the number of available cycles at the outset and imposing a more moderate price penalty for later operations. You have to play with this, I hope you got the idea.
Hopefully, you can express all your requirements as such linear constraints in your binary variables. Of course, there may be many opportunities to exploit your insight into your specific problem to do with less constraints or less variables.
Then, hand your problem off to lp-solve or cplex and let them find the best solution!
At first blush, it sounds like this problem might fit into a dynamic programming solution. Several operations may take the same amount of time so you might end up with overlapping subproblems.
If you can map your problem to the "travelling salesman" (like: Find the optimal sequence to run all operations in minimum time), then you have an NP-complete problem.
A very quick way to solve that is the ant algorithm (or ant colony optimization).
The idea is that you send an ant down every path. The ant spreads a smelly substance on the path which evaporates over time. Short parts mean that the path will stink more when the next ant comes along. Ants prefer smelly over clean paths. Run thousands of ants through the network. The most smelly path is the optimal one (or at least very close).
Try simulated annealing, cfr. http://en.wikipedia.org/wiki/Simulated_annealing .

Resources