Related
I'm working on a project with many unknowns like moving the app from one platform to another.
My original estimations are way off and there is no way I can really know for sure when this will end.
How can i deal with the inability to estimate such a project. It's not that I'm adding a button to a screen or designing a web site, or creating and app or even fixing bugs. These are not methods with bugs, these are assumptions made in the overall code, which are not correct anymore and are found step by step and each analyzed and mitigated with many more unknowns.
I happened to write a master thesis about software-estimation and there are lessons I've learned:
-1st Count, 2nd compute, 3rd judge - this means: first try to identify items in your work which are countable e.g files, classes, LOCs, UIs, etc. Then calculate using this data the effort (in person/days). Use judgement as the last ressort.
-Document your estimation! Show numbers. This minimizes your risk, thus you will present results not as your opinion, but as more or less objective figures. (In general, the more paper the cleaner the backside)
-Estimation is not a commitment. Commitment is one number, estimation is always a range - so give your estimation as a range ( use cone of uncertainty to select the range properly http://www.construx.com/Page.aspx?hid=1648 )
-Devide: Use WBS, devide your work in small pieces and estimate them separately. The granulity depents on the entire length, but at most a working-package soultn't be bigger than 10% of entire effort.
-Estimate effort first, then schedule, then costs.
-Consider estimation as support for planing, reestimate on each project phase (s. cone of uncertainty).
I would suggest the book http://www.stevemcconnell.com/est.htm which deals all these points, in particular how to deal with bosses, who try to pull a commitment from you.
Regards,
Valentin Heinitz
There's no really right answer for coming up with an accurate estimation, because there's no way to know it.
as for estimating the work itself, think about how each step can be divided into separate sub-steps, and break those down even smaller, until you can get a fair picture of as much of the work as you can, with chunks small and discreet enough to give sound estimates for. If you can, come up with both an expected time and a worst-case time, to get a range of where you could land.
Another way to approach this is to ignore the old system. It sounds like a headache. Make an estimate of scraping the old system and implementing a new one from scratch, or integrating a 3rd party, off the shelf solution. If there's a case to be made for this, it is worth at least investigating it.
Sounds like a post for postsecret not SO. :)
I would tell him that it will be done when its done, and if thats not good enough, he can learn to program and help you. Then again, I think that you might get fired, but hey that sounds like it might be better.
Tell him more or less what you told us. The project is too volatile too give an accurate estimate and the best you can do is give an estimate for a given task. As long as the number of tasks is unknown so will be the estimate. If he is at all worth his salary he would rather hear this than some made up number. This is not uncommon when dealing with a large legacy code base.
It's not that I'm adding a button to a screen or designing a web site,
or creating and app or even fixing bugs.
That is a real problem. You can not estimate what you don't have experience in. The only thing you can do is pad your estimate until you think it is a reasonable amount of time. The more unknowns you think there are the more you pad. The less you know about it the more you pad.
I read the below book and it spoke at length about accuracy vs precision. Basically you can be accurate but have a very large range. For instance you can be certain the task will be between 1 day and 1 year to complete. That is not very precise but it is really accurate.
Software Estimation Demystifying...
Some tips for estimating
what are the software metrics that can be used to measuring the performance of pair programming ?
to be clear
is there any metrics used to measure pair programming specifically and does not use to measure the individual programmer ? what are the parameters used for measuring ?
for example:if we want to measure the cost for both individual and pair programming
let's assume that for the individual programming Cost = x so for the pair will be Cost = 2* x
right
and the same for the time for individual Time = t while for pair Time = 2* t
so if I would like to use Lines of code for measuring the product size , is there any different between individual and pair by using this metric?
any idea
Sorry to spoil your party, but lines of code is one of the worst metrics possible, especially if people know their assessment or bonus is in any way tied to the metric. It actively encourages cut and paste programming and other attrocities. It's more effort, but why don't you categorise the workload in terms of expected effort for one person, based on your historical data? Or, get some programmers to agree to do a few projects redundantly, rotating between pair-programming and individual, so you can see how the same programmers go at each. As one good programmer can be more productive than two average programmers (I vaguely remember an old IBM study concluding someone in the top percentile was 27x more productive than median), it's useful to see the same programmers doing it both ways. If objectively discovering the right process through such an experiment is too costly in terms of lost short-term productivity, then you're better off not bothering with the LOC metrics anyway... good programmers knowing their work arrangements are being based on such will probably be highly unimpressed.
Remember that there are also intangibles involved... pair programming - IMHO - forces people to keep focused, and to make design decisions that are more rounded and professional. Just the social contact can help relieve boredom, though it may stress some people too. My suspicion is that - whether or not it's faster to begin with - it makes for better, more maintainable results. It also ensures skill and knowledge transfer. You should factor in such intangible aspects as best you can - maybe doing interviews or anonymous surveys with the trial participants.
I guess what you try to ask, is how to measure efficiency of the team that uses pair programming. If yes, then answer is the measurement of efficiency doesn't depend on method or proccess of work team is using. You should try to evaluate the quality of their product releases, with metrics like number of issues identified post release. Probably the velocity.
and, please, don't use lines of code for efficiency measurement. It doesn't make sense. Lines of code is a measure of product size and not developer efficiency. It's like using height or weight to judge how smart you are. There is no correlation between amount of code and individual efficiency.
if you are interested in more software metrics, take a look at http://www.sdlcmetrics.org
I'm designing a realtime strategy wargame where the AI will be responsible for controlling a large number of units (possibly 1000+) on a large hexagonal map.
A unit has a number of action points which can be expended on movement, attacking enemy units or various special actions (e.g. building new units). For example, a tank with 5 action points could spend 3 on movement then 2 in firing on an enemy within range. Different units have different costs for different actions etc.
Some additional notes:
The output of the AI is a "command" to any given unit
Action points are allocated at the beginning of a time period, but may be spent at any point within the time period (this is to allow for realtime multiplayer games). Hence "do nothing and save action points for later" is a potentially valid tactic (e.g. a gun turret that cannot move waiting for an enemy to come within firing range)
The game is updating in realtime, but the AI can get a consistent snapshot of the game state at any time (thanks to the game state being one of Clojure's persistent data structures)
I'm not expecting "optimal" behaviour, just something that is not obviously stupid and provides reasonable fun/challenge to play against
What can you recommend in terms of specific algorithms/approaches that would allow for the right balance between efficiency and reasonably intelligent behaviour?
If you read Russell and Norvig, you'll find a wealth of algorithms for every purpose, updated to pretty much today's state of the art. That said, I was amazed at how many different problem classes can be successfully approached with Bayesian algorithms.
However, in your case I think it would be a bad idea for each unit to have its own Petri net or inference engine... there's only so much CPU and memory and time available. Hence, a different approach:
While in some ways perhaps a crackpot, Stephen Wolfram has shown that it's possible to program remarkably complex behavior on a basis of very simple rules. He bravely extrapolates from the Game of Life to quantum physics and the entire universe.
Similarly, a lot of research on small robots is focusing on emergent behavior or swarm intelligence. While classic military strategy and practice are strongly based on hierarchies, I think that an army of completely selfless, fearless fighters (as can be found marching in your computer) could be remarkably effective if operating as self-organizing clusters.
This approach would probably fit a little better with Erlang's or Scala's actor-based concurrency model than with Clojure's STM: I think self-organization and actors would go together extremely well. Still, I could envision running through a list of units at each turn, and having each unit evaluating just a small handful of very simple rules to determine its next action. I'd be very interested to hear if you've tried this approach, and how it went!
EDIT
Something else that was on the back of my mind but that slipped out again while I was writing: I think you can get remarkable results from this approach if you combine it with genetic or evolutionary programming; i.e. let your virtual toy soldiers wage war on each other as you sleep, let them encode their strategies and mix, match and mutate their code for those strategies; and let a refereeing program select the more successful warriors.
I've read about some startling successes achieved with these techniques, with units operating in ways we'd never think of. I have heard of AIs working on these principles having had to be intentionally dumbed down in order not to frustrate human opponents.
First you should aim to make your game turn based at some level for the AI (i.e. you can somehow model it turn based even if it may not be entirely turn based, in RTS you may be able to break discrete intervals of time into turns.) Second, you should determine how much information the AI should work with. That is, if the AI is allowed to cheat and know every move of its opponent (thereby making it stronger) or if it should know less or more. Third, you should define a cost function of a state. The idea being that a higher cost means a worse state for the computer to be in. Fourth you need a move generator, generating all valid states the AI can transition to from a given state (this may be homogeneous [state-independent] or heterogeneous [state-dependent].)
The thing is, the cost function will be greatly influenced by what exactly you define the state to be. The more information you encode in the state the better balanced your AI will be but the more difficult it will be for it to perform, as it will have to search exponentially more for every additional state variable you include (in an exhaustive search.)
If you provide a definition of a state and a cost function your problem transforms to a general problem in AI that can be tackled with any algorithm of your choice.
Here is a summary of what I think would work well:
Evolutionary algorithms may work well if you put enough effort into them, but they will add a layer of complexity that will create room for bugs amongst other things that can go wrong. They will also require extreme amounts of tweaking of the fitness function etc. I don't have much experience working with these but if they are anything like neural networks (which I believe they are since both are heuristics inspired by biological models) you will quickly find they are fickle and far from consistent. Most importantly, I doubt they add any benefits over the option I describe in 3.
With the cost function and state defined it would technically be possible for you to apply gradient decent (with the assumption that the state function is differentiable and the domain of the state variables are continuous) however this would probably yield inferior results, since the biggest weakness of gradient descent is getting stuck in local minima. To give an example, this method would be prone to something like attacking the enemy always as soon as possible because there is a non-zero chance of annihilating them. Clearly, this may not be desirable behaviour for a game, however, gradient decent is a greedy method and doesn't know better.
This option would be my most highest recommended one: simulated annealing. Simulated annealing would (IMHO) have all the benefits of 1. without the added complexity while being much more robust than 2. In essence SA is just a random walk amongst the states. So in addition to the cost and states you will have to define a way to randomly transition between states. SA is also not prone to be stuck in local minima, while producing very good results quite consistently. The only tweaking required with SA would be the cooling schedule--which decides how fast SA will converge. The greatest advantage of SA I find is that it is conceptually simple and produces superior results empirically to most other methods I have tried. Information on SA can be found here with a long list of generic implementations at the bottom.
3b. (Edit Added much later) SA and the techniques I listed above are general AI techniques and not really specialized to AI for games. In general, the more specialized the algorithm the more chance it has at performing better. See No Free Lunch Theorem 2. Another extension of 3 is something called parallel tempering which dramatically improves the performance of SA by helping it avoid local optima. Some of the original papers on parallel tempering are quite dated 3, but others have been updated4.
Regardless of what method you choose in the end, its going to be very important to break your problem down into states and a cost function as I said earlier. As a rule of thumb I would start with 20-50 state variables as your state search space is exponential in the number of these variables.
This question is huge in scope. You are basically asking how to write a strategy game.
There are tons of books and online articles for this stuff. I strongly recommend the Game Programming Wisdom series and AI Game Programming Wisdom series. In particular, Section 6 of the first volume of AI Game Programming Wisdom covers general architecture, Section 7 covers decision-making architectures, and Section 8 covers architectures for specific genres (8.2 does the RTS genre).
It's a huge question, and the other answers have pointed out amazing resources to look into.
I've dealt with this problem in the past and found the simple-behavior-manifests-complexly/emergent behavior approach a bit too unwieldy for human design unless approached genetically/evolutionarily.
I ended up instead using abstracted layers of AI, similar to a way armies work in real life. Units would be grouped with nearby units of the same time into squads, which are grouped with nearby squads to create a mini battalion of sorts. More layers could be use here (group battalions in a region, etc.), but ultimately at the top there is the high-level strategic AI.
Each layer can only issue commands to the layers directly below it. The layer below it will then attempt to execute the command with the resources at hand (ie, the layers below that layer).
An example of a command issued to a single unit is "Go here" and "shoot at this target". Higher level commands issued to higher levels would be "secure this location", which that level would process and issue the appropriate commands to the lower levels.
The highest level master AI is responsible for very board strategic decisions, such as "we need more ____ units", or "we should aim to move towards this location".
The army analogy works here; commanders and lieutenants and chain of command.
ok, so i have been working on my chess program for a while and i am beginning to hit a wall. i have done all of the standard optimizations (negascout, iterative deepening, killer moves, history heuristic, quiescent search, pawn position evaluation, some search extensions) and i'm all out of ideas!
i am looking to make it multi-threaded soon, and that should give me a good boost in performance, but aside from that are there any other nifty tricks you guys have come across? i have considered switching to MDF(f), but i have heard it is a hassle and isn't really worth it.
what i would be most interested in is some kind of learning algorithm, but i don't know if anyone has done that effectively with a chess program yet.
also, would switching to a bit board be significant? i currently am using 0x88.
Over the last year of development of my chess engine (www.chessbin.com), much of the time has been spent optimizing my code to allow for better and faster move searching. Over that time I have learned a few tricks that I would like to share with you.
Measuring Performance
Essentially you can improve your performance in two ways:
Evaluate your nodes faster
Search fewer nodes to come up with
the same answer
Your first problem in code optimization will be measurement. How do you know you have really made a difference? In order to help you with this problem you will need to make sure you can record some statistics during your move search. The ones I capture in my chess engine are:
Time it took for the search to
complete.
Number of nodes searched
This will allow you to benchmark and test your changes. The best way to approach testing is to create several save games from the opening position, middle game and the end game. Record the time and number of nodes searched for black and white.
After making any changes I usually perform tests against the above mentioned save games to see if I have made improvements in the above two matrices: number of nodes searched or speed.
To complicate things further, after making a code change you might run your engine 3 times and get 3 different results each time. Let’s say that your chess engine found the best move in 9, 10 and 11 seconds. That is a spread of about 20%. So did you improve your engine by 10%-20% or was it just varied load on your pc. How do you know? To fight this I have added methods that will allow my engine to play against itself, it will make moves for both white and black. This way you can test not just the time variance over one move, but a series of as many as 50 moves over the course of the game. If last time the game took 10 minutes and now it takes 9, you probably improved your engine by 10%. Running the test again should confirm this.
Finding Performance Gains
Now that we know how to measure performance gains lets discuss how to identify potential performance gains.
If you are in a .NET environment then the .NET profiler will be your friend. If you have a Visual Studio for Developers edition it comes built in for free, however there are other third party tools you can use. This tool has saved me hours of work as it will tell you where your engine is spending most of its time and allow you to concentrate on your trouble spots. If you do not have a profiler tool you may have to somehow log the time stamps as your engine goes through different steps. I do not suggest this. In this case a good profiler is worth its weight in gold. Red Gate ANTS Profiler is expensive but the best one I have ever tried. If you can’t afford one, at least use it for their 14 day trial.
Your profiler will surly identify things for you, however here are some small lessons I have learned working with C#:
Make everything private
Whatever you can’t make private, make
it sealed
Make as many methods static as
possible.
Don’t make your methods chatty, one
long method is better than 4 smaller
ones.
Chess board stored as an array [8][8]
is slower then an array of [64]
Replace int with byte where possible.
Return from your methods as early as
possible.
Stacks are better than lists
Arrays are better than stacks and
lists.
If you can define the size of the
list before you populate it.
Casting, boxing, un-boxing is evil.
Further Performance Gains:
I find move generation and ordering is extremely important. However here is the problem as I see it. If you evaluate the score of each move before you sort and run Alpha Beta, you will be able to optimize your move ordering such that you will get extremely quick Alpha Beta cutoffs. This is because you will be able to mostly try the best move first.
However the time you have spent evaluating each move will be wasted. For example you might have evaluated the score on 20 moves, sort your moves try the first 2 and received a cut-off on move number 2. In theory the time you have spent on the other 18 moves was wasted.
On the other hand if you do a lighter and much faster evaluation say just captures, your sort will not be that good and you will have to search more nodes (up to 60% more). On the other hand you would not do a heavy evaluation on every possible move. As a whole this approach is usually faster.
Finding this perfect balance between having enough information for a good sort and not doing extra work on moves you will not use, will allow you to find huge gains in your search algorithm. Furthermore if you choose the poorer sort approach you will want to first to a shallower search say to ply 3, sort your move before you go into the deeper search (this is often called Iterative Deepening). This will significantly improve your sort and allow you to search much fewer moves.
Answering an old question.
Assuming you already have a working transposition table.
Late Move Reduction. That gave my program about 100 elo points and it is very simple to implement.
In my experience, unless your implementation is very inefficient, then the actual board representation (0x88, bitboard, etc.) is not that important.
Although you can criple you chess engine with bad performance, a lightning fast move generator in itself is not going to make a program good.
The search tricks used and the evaluation function are the overwhelming factors determining overall strength.
And the most important parts, by far, of the evaluation are Material, Passed pawns, King Safety and Pawn Structure.
The most important parts of the search are: Null Move Pruning, Check Extension and Late Move reduction.
Your program can come a long, long way, on these simple techniques alone!
Good move ordering!
An old question, but same techniques apply now as for 5 years ago. Aren't we all writing our own chess engines, I have my own called "Norwegian Gambit" that I hope will eventually compete with other Java engines on the CCRL. I as many others use Stockfish for ideas since it is so nicely written and open. Their testing framework Fishtest and it's community also gives a ton of good advice. It is worth comparing your evaluation scores with what Stockfish gets since how to evaluate is probably the biggest unknown in chess-programming still and Stockfish has gone away from many traditional evals which have become urban legends (like the double bishop bonus). The biggest difference however was after I implemented the same techniques as you mention, Negascout, TT, LMR, I started using Stockfish for comparison and I noticed how for the same depth Stockfish had much less moves searched than I got (because of the move ordering).
Move ordering essentials
The one thing that is easily forgotten is good move-ordering. For the Alpha Beta cutoff to be efficient it is essential to get the best moves first. On the other hand it can also be time-consuming so it is essential to do it only as necessary.
Transposition table
Sort promotions and good captures by their gain
Killer moves
Moves that result in check on opponent
History heuristics
Silent moves - sort by PSQT value
The sorting should be done as needed, usually it is enough to sort the captures, and thereafter you could run the more expensive sorting of checks and PSQT only if needed.
About Java/C# vs C/C++/Assembly
Programming techniques are the same for Java as in the excellent answer by Adam Berent who used C#. Additionally to his list I would mention avoiding Object arrays, rather use many arrays of primitives, but contrary to his suggestion of using bytes I find that with 64-bit java there's little to be saved using byte and int instead of 64bit long. I have also gone down the path of rewriting to C/C++/Assembly and I am having no performance gain whatsoever. I used assembly code for bitscan instructions such as LZCNT and POPCNT, but later I found that Java 8 also uses those instead of the methods on the Long object. To my surprise Java is faster, the Java 8 virtual machine seems to do a better job optimizing than a C compiler can do.
I know that one improvement that was talked about at the AI courses in university where having a huge database of finishing moves. So having a precalculated database for games with only a small number of figures left. So that if you hit a near end positioning in your search you stop the search and take a precalculated value that improves your search results like extra deepening that you can do for important/critique moves without much computation time spend. I think it also comes with a change in heuristics in a late game state but I'm not a chess player so I don't know the dynamics of game finishing.
Be warned, getting game search right in a threaded environment can be a royal pain (I've tried it). It can be done, but from some literature searching I did a while back, it's extremely hard to get any speed boost at all out of it.
Its quite an old question, I was just searching questions on chess and found this one unanswered. Well it may not be of any help to you now, but may prove helpful to other users.
I didn't see null move pruning, transposition tables.. are you using them? They would give you a big boost...
One thing that gave me a big boost was minimizing conditional branching... Alot of things can be precomputed. Search for such opportunities.
Most modern PCs have multiple cores so it would be a good idea making it multithreading. You don't necessarily need to go MDF(f) for that.
I wont suggest moving your code to bitboard. Its simply too much work. Even though bitboards could give a boost on 64 bit machines.
Finally and most importantly chess literature dominates any optimizations we may use. optimization is too much work. Look at open source chess engines, particularly crafty and fruit/toga. Fruit used to be open source initially.
Late answer, but this may help someone:
Given all the optimizations you mentioned, 1450 ELO is very low. My guess is that something is very wrong with your code. Did you:
Wrote a perft routine and ran it through a set of positions? All tests should pass, so you know your move generator is free of bugs. If you don't have this, there's no point in talking about ELO.
Wrote a mirrorBoard routine and ran the evaluation code through a set of positions? The result should be the same for the normal and mirrored positions, otherwise you have a bug in your eval.
Do you have a hashtable (aka transposition table)? If not, this is a must. It will help while searching and ordering moves, giving a brutal difference in speed.
How do you implement move ordering? This links back to point 3.
Did you implement the UCI protocol? Is your move parsing function working properly? I had a bug like this in my engine:
/* Parses a uci move string and return a Board object */
Board parseUCIMoves(String moves)// e2e4 c7c5 g1f3 ...{
//...
if (someMove.equals("e1g1") || someMove.equals("e1c1"))
//apply proper castle
//...
}
Sometimes the engine crashed while playing a match, and I thought it was the GUI fault, since all perft tests were ok. It took me one week to find the bug by luck. So, test everything.
For (1) you can search every position to depth 6. I use a file with ~1000 positions. See here https://chessprogramming.wikispaces.com/Perft
For (2) you just need a file with millions of positions (just the FEN string).
Given all the above and a very basic evaluation function (material, piece square tables, passed pawns, king safety) it should play at +-2000 ELO.
As far as tips, I know large gains can be found in optimizing your move generation routines before any eval functions. Making that function as tight as possible can give you 10% or more in nodes/sec improvement.
If you're moving to bitboards, do some digging on rec.games.chess.computer archives for some of Dr. Robert Hyatts old posts about Crafty (pretty sure he doesn't post anymore). Or grab the latest copy from his FTP and start digging. I'm pretty sure it would be a significant shift for you though.
Transposition Table
Opening Book
End Game Table Bases
Improved Static Board Evaluation for Leaf Nodes
Bitboards for Raw Speed
Profile and benchmark. Theoretical optimizations are great, but unless you are measuring the performance impact of every change you make, you won't know whether your work is improving or worsening the speed of the final code.
Try to limit the penalty to yourself for trying different algorithms. Make it easy to test various implementations of algorithms against one another. i.e. Make it easy to build a PVS version of your code as well as a NegaScout version.
Find the hot spots. Refactor. Rewrite in assembly if necessary. Repeat.
Assuming "history heuristic" involves some sort of database of past moves, a learning algorithm isn't going to give you much more unless it plays a lot of games against the same player. You can probably achieve more by classifying a player and tweaking the selection of moves from your historic database.
It's been a long time since I've done any programming on any chess program, but at the time, bit boards did give a real improvement. Other than that I can't give you much advise. Do you only evaluate the position of pawns? Some (slight) bonuses for position or mobility of some key pieces may be in order.
I'm not certain what type of thing you would like it to learn however...
What are some good tips and/or techniques for optimizing and improving the performance of calculation heavy programs. I'm talking about things like complication graphics calculations or mathematical and simulation types of programming where every second saved is useful, as opposed to IO heavy programs where only a certain amount of speedup is helpful.
While changing the algorithm is frequently mentioned as the most effective method here,I'm trying to find out how effective different algorithms are in the first place, so I want to create as much efficiency with each algorithm as is possible. The "problem" I'm solving isn't something thats well known, so there are few if any algorithms on the web, but I'm looking for any good advice on how to proceed and what to look for.
I am exploring the differences in effectiveness between evolutionary algorithms and more straightforward approaches for a particular group of related problems. I have written three evolutionary algorithms for the problem already and now I have written an brute force technique that I am trying to make as fast as possible.
Edit: To specify a bit more. I am using C# and my algorithms all revolve around calculating and solving constraint type problems for expressions (using expression trees). By expressions I mean things like x^2 + 4 or anything else like that which would be parsed into an expression tree. My algorithms all create and manipulate these trees to try to find better approximations. But I wanted to put the question out there in a general way in case it would help anyone else.
I am trying to find out if it is possible to write a useful evolutionary algorithm for finding expressions that are a good approximation for various properties. Both because I want to know what a good approximation would be and to see how the evolutionary stuff compares to traditional methods.
It's pretty much the same process as any other optimization: profile, experiment, benchmark, repeat.
First you have to figure out what sections of your code are taking up the time. Then try different methods to speed them up (trying methods based on merit would be a better idea than trying things at random). Benchmark to find out if you actually did speed them up. If you did, replace the old method with the new one. Profile again.
I would recommend against a brute force approach if it's at all possible to do it some other way. But, here are some guidelines that should help you speed your code up either way.
There are many, many different optimizations you could apply to your code, but before you do anything, you should profile to figure out where the bottleneck is. Here are some profilers that should give you a good idea about where the hot spots are in your code:
GProf
PerfMon2
OProfile
HPCToolkit
These all use sampling to get their data, so the overhead of running them with your code should be minimal. Only GProf requires that you recompile your code. Also, the last three let you do both time and hardware performance counter profiles, so once you do a time (or CPU cycle) profile, you can zoom in on the hotter regions and find out why they might be running slow (cache misses, FP instruction counts, etc.).
Beyond that, it's a matter of thinking about how best to restructure your code, and this depends on what the problem is. It may be that you've just got a loop that the compiler doesn't optimize well, and you can inline or move things in/out of the loop to help the compiler out. Or, if you're running as fast as you can with basic arithmetic ops, you may want to try to exploit vector instructions (SSE, etc.) If your code is parallel, you might have load balance problems, and you may need to restructure your code so that data is better distributed across cores.
These are just a few examples. Performance optimization is complex, and it might not help you nearly enough if you're doing a brute force approach to begin with.
For more information on ways people have optimized things, there were some pretty good examples in the recent Why do you program in assembly? question.
If your optimization problem is (quasi-)convex or can be transformed into such a form, there are far more efficient algorithms than evolutionary search.
If you have large matrices, pay attention to your linear algebra routines. The right algorithm can make shave an order of magnitude off the computation time, especially if your matrices are sparse.
Think about how data is loaded into memory. Even when you think you're spending most of your time on pure arithmetic, you're actually spending a lot of time moving things between levels of cache etc. Do as much as you can with the data while it's in the fastest memory.
Try to avoid unnecessary memory allocation and de-allocation. Here's where it can make sense to back away from a purely OO approach.
This is more of a tip to find holes in the algorithm itself...
To realize maximum performance, simplify everything inside the most inner loop at the expense of everything else.
One example of keeping things simple is the classic bouncing ball animation. You can implement gravity by looking up the definition in your physics book and plugging in the numbers, or you can do it like this and save precious clock cycles:
initialize:
float y = 0; // y coordinate
float yi = 0; // incremental variable
loop:
y += yi;
yi += 0.001;
if (y > 10)
yi = -yi;
But now let's say you're having to do this with nested loops in an N-body simulation where every particle is attracted to every other particle. This can be an enormously processor intensive task when you're dealing with thousands of particles.
You should of course take the same approach as to simplify everything inside the most inner loop. But more than that, at the very simplest level you should also use data types wisely. For example, math operations are faster when working with integers than floating point variables. Also, addition is faster than multiplication, and multiplication is faster than division.
So with all of that in mind, you should be able to simplify the most inner loop using primarily addition and multiplication of integers. And then any scaling down you might need to do can be done afterwards. To take the y and yi example, if yi is an integer that you modify inside the inner loop then you could scale it down after the loop like this:
y += yi * 0.01;
These are very basic low-level performance tips, but they're all things I try to keep in mind whenever I'm working with processor intensive algorithms. Of course, if you then take these ideas and apply them to parallel processing on a GPU then you can take your algorithm to a whole new level. =)
Well how you do this depends the most on which language
you are using. Still, the key in any language
in the profiler. Profile your code. See which
functions/operations are taking the most time and then determine
if you can make these costly operations more efficient.
Standard bottlenecks in numerical algorithms are memory
usage (do you access matrices in the order which the elements
are stored in memory); communication overhead, etc. They
can be little different than other non-numerical programs.
Moreover, many other factors such as preconditioning, etc.
can lead to drastically difference performance behavior
of the SAME algorithm on the same problem. Make sure
you determine optimal parameters for your implementations.
As for comparing different algorithms, I recommend
reading the paper
"Benchmarking optimization software with performance profiles,"
Jorge Moré and Elizabeth D. Dolan, Mathematical Programming 91 (2002), 201-213.
It provides a nice, uniform way to compare different algorithms being
applied to the same problem set. It really should be better known
outside of the optimization community (in my not so humble opinion
at least).
Good luck!