I have a very long running linear program implemented in Matlab that, for small versions of the problem, is solved by linprog. Going to “full size” on my LP consistently exceeds my maximum allowed job runtime on the system I’m running on. I’ve had no luck finding a way to checkpoint linprog. Is there a way?
Yes and no. Simplex-based LP solvers have a fantastic restart facility based on an LP basis. Many LP solvers can read and write MPS basis files. I don't think Matlab's LP solver can do that. I know users that never run models from scratch but only from an advanced basis (to be clear: not with Matlab).
Linprog comes with three different algorithms (one simplex, two interior-point). As the model is large, you may try the interior-point solver instead of the default simplex solver. Interior-point solvers are often faster than Simplex solvers on large problems.
Another thing to look at is whether you run out of physical RAM. If a solver starts to use virtual memory, things may become extremely slow.
The problem is 1,484,161 variables with 3,851,432 constraints, but that gets preprocessed which "removed 469,967 inequalities, 361,192 equalities, 377,192 variables, and 3,973,926 non-zero elements." This is large but not extremely large. I solve larger models than this on relatively modest hardware. Almost always I solve these in way less than an hour.
Related
I have a large LP with more than 10 million decision variables and nearly the same number of constraints. I use CPLEX to solve the LP but it takes ~20 hours to solve, and that's on the best server of our institution.
Are there ways to significantly speedup the solution time (without adding more servers)?
I've read about Quantum Computing and its application in optimization problems speedup. Does anybody have a similar experience in that, or more generally any other ways of reducing the solution time?
if your model is a LP then I suggest CPLEX Performance Tuning for Linear Programs
You could also try data check 2 to see whether you have some numerical issues in your model.
If still no improvement you could try to improve your model.
NB:
If it s a MIP then you could read this
Do symbolic math calculations (especially for solving nonlinear polynomial systems) cause huge performance (calculation speed) disadvantage compared to numeric calculations? Are there any benchmark/data about this?
Found a related question: Symbolic computation vs. numerical computation
Another one: Computational Efficiency of Forward Mode Automatic vs Numeric vs Symbolic Differentiation
I am the individual who answered the Scicomp question you reference in your question. I personally am not aware of any empirical metrics performed to compare run-time performance for symbolic versus numerical solutions to systems of polynomial equations.
However, it should be fairly intuitive that symbolic solutions will have a bit more overhead for most aspects of solving the problem due to things such as manipulation of terms in the equation symbolically, searching how to simplify/rearrange equations to make them easier to solve, searching through known closed form solutions, etc. One major issue with symbolic solvers is that you may not have a closed form solution you can find and use, so solving it numerically would have to happen either way.
The only way I can see symbolic solvers outperforming numerical solutions in terms of run-time is if the symbolic solver can quickly enough recognize your problem as one with a known analytical solution or if it arrives at the solution eventually while the numerical solver never does (aka it diverges).
Given you can find a numerical solver that converges, I think the numerical case will generally be much more efficient since there's just much less overhead to make progress in refining your solution. Since you mention solving systems of polynomial equations, I suspect there are also some tailored algorithms for your type of problem that may be superior to typical nonlinear equation solving schemes.
This is not a direct answer to the question but a suggested course correction.
While it is possible to evaluate math expressions in a purely numeric means or in a purely symbolic means, it is also possible to use a hybrid approach.
This is know as Symbolic-numeric computation
Maple is one software package that has this ability.
Note: I have never used Maple so I can't add more.
Searching for packages
I find I get better results when searching for math packages that use symbolic-numeric computation by searching for the name of the package combined with Symbolic-numeric computation, e.g.
wolfram symbolic-numeric computation
A specific example related to neural networks
In the world of neural networks one has to be able to calculate the derivative, however if a derivate can be simplified before calculating then the cost of calculating goes down. Since simplifying the derivative is a one time action while the cost of calculating occurs thousands to millions of times, the simplification is done symbolically and then the calculation is done numerically. Theano is a software package that does this specifically for use with neural networks.
I am writing a paper to test a new application that will demonstrate the benefits of parallelized computation (compared to the traditional serialized version of this application). I want to use the canonical examples for parallel computation in my paper.
My first example is the parallel computation of pi. I would ideally like an example where each iteration is very time consuming (because of the additional overhead associated with parallelizing); my first thought is a Bayesian simulation with MCMC and Gibbs sampling.
What other problems are typically discussed in this context? What are good examples of large embarassingly parallel problems?
just a few more -
Multiplying matrices
Inverting matrices
FFT
String matching
Rendering 3d scenes (via scan line conversion or ray tracing)
One example I've used in the past of an embarrassingly parallel problem is visualizing the mandelbrot set. Each pixel can be computed independently.
Conway's Life is interesting as well, in that each value of the "next" board can be computed independently, but will depend on the relevant bits of the "current" board being done already.
I would suggest that canonical examples of parallel computation and embarassingly parallel problems are, if not completely, then nearly, disjoint sets. To put it another way, people working in parallel computation aren't terribly excited about embarassingly parallel problems; we call them that because we'd be embarassed to be working on them.
I'd be looking, if I were you, at these (a not entirely original list):
linear algebra on large dense matrices, both direct and iterative approaches;
linear algebra on huge sparse matrices
branch and bound approaches to linear programming (and related) problems;
sequence matching for bioinformatics (outside my field, I may have mis-expressed this);
continuos optimisation.
I expect there are many more.
EDIT: You may be interested in this list of problems which have been selected for benchmarking the next generation of European (academic) supercomputers. It will give you some idea of where that niche is heading.
Molecular dynamics simluations allow you to change the size of the problem until your computer resources are exhausted (i.e. 256 particles vs. 256,000,000 particles). Its truly a "canonical" example if you run the MD simulations under NVT conditions ;-)
My favorite example is monte carlo simulation.
Word counting seems to be the canonical example for MapReduce.
http://en.wikipedia.org/wiki/MapReduce#Example
Finding collisions in hash functions using Paul C. van Oorschot and Michael J. Weiner's method (PDF) comes up often in various cryptographic settings.
I used the Mandelbrot set demo to explain to my mom what parallel programming is about : http://www.ateji.com/px/demo.html
All the examples you mentions are mostly heavy data-parallel codes. You'll probably want to mention also task-oriented codes, such as servers responding to many requests in parallel, and data-flow or stream programming examples (MapReduce is a good representative of this class).
I did a little GP (note:very little) work in college and have been playing around with it recently. My question is in regards to the intial run settings (population size, number of generations, min/max depth of trees, min/max depth of initial trees, percentages to use for different reproduction operations, etc.). What is the normal practice for setting these parameters? What papers/sites do people use as a good guide?
You'll find that this depends very much on your problem domain - in particular the nature of the fitness function, your implementation DSL etc.
Some personal experience:
Large population sizes seem to work
better when you have a noisy fitness
function, I think this is because the growth
of sub-groups in the population over successive generations acts
to give more sampling of
the fitness function. I typically use
100 for less noisy/deterministic functions, 1000+
for noisy.
For number of generations it is best to measure improvements in the
fitness function and stop when it
meets your target criteria. I normally run a few hundred generations and see what kind of answers are coming out, if it is showing no improvement then you probably have an issue elsewhere.
Tree depth requirements are really dependent on your DSL. I sometimes try to do an
implementation without explicit
limits but penalise or eliminate
programs that run too long (which is probably
what you really care about....). I've also found total node counts of ~1000 to be quite useful hard limits.
Percentages for different mutation / recombination operators don't seem
to matter all that much. As long as
you have a comprehensive set of mutations, any reasonably balanced
distribution will usually work. I think the reason for this is that you are basically doing a search for favourable improvements so the main objective is just to make sure the trial improvements are reasonably well distributed across all the possibilities.
Why don't you try using a genetic algorithm to optimise these parameters for you? :)
Any problem in computer science can be
solved with another layer of
indirection (except for too many
layers of indirection.)
-David J. Wheeler
When I started looking into Genetic Algorithms I had the same question.
I wanted to collect data variating parameters on a very simple problem and link given operators and parameters values (such as mutation rates, etc) to given results in function of population size etc.
Once I started getting into GA a bit more I then realized that given the enormous number of variables this is a huge task, and generalization is extremely difficult.
talking from my (limited) experience, if you decide to simplify the problem and use a fixed way to implement crossover, selection, and just play with population size and mutation rate (implemented in a given way) trying to come up with general results you'll soon realize that too many variables are still into play because at the end of the day the number of generations after which statistically you will get a decent result (whatever way you wanna define decent) still obviously depend primarily on the problem you're solving and consequently on the genome size (representing the same problem in different ways will obviously lead to different results in terms of effect of given GA parameters!).
It is certainly possible to draft a set of guidelines - as the (rare but good) literature proves - but you will be able to generalize the results effectively in statistical terms only when the problem at hand can be encoded in the exact same way and the fitness is evaluated in a somehow an equivalent way (which more often than not means you're ealing with a very similar problem).
Take a look at Koza's voluminous tomes on these matters.
There are very different schools of thought even within the GP community -
Some regard populations in the (low) thousands as sufficient whereas Koza and others often don't deem if worthy to start a GP run with less than a million individuals in the GP population ;-)
As mentioned before it depends on your personal taste and experiences, resources and probably the GP system used!
Cheers,
Jan
What are some good tips and/or techniques for optimizing and improving the performance of calculation heavy programs. I'm talking about things like complication graphics calculations or mathematical and simulation types of programming where every second saved is useful, as opposed to IO heavy programs where only a certain amount of speedup is helpful.
While changing the algorithm is frequently mentioned as the most effective method here,I'm trying to find out how effective different algorithms are in the first place, so I want to create as much efficiency with each algorithm as is possible. The "problem" I'm solving isn't something thats well known, so there are few if any algorithms on the web, but I'm looking for any good advice on how to proceed and what to look for.
I am exploring the differences in effectiveness between evolutionary algorithms and more straightforward approaches for a particular group of related problems. I have written three evolutionary algorithms for the problem already and now I have written an brute force technique that I am trying to make as fast as possible.
Edit: To specify a bit more. I am using C# and my algorithms all revolve around calculating and solving constraint type problems for expressions (using expression trees). By expressions I mean things like x^2 + 4 or anything else like that which would be parsed into an expression tree. My algorithms all create and manipulate these trees to try to find better approximations. But I wanted to put the question out there in a general way in case it would help anyone else.
I am trying to find out if it is possible to write a useful evolutionary algorithm for finding expressions that are a good approximation for various properties. Both because I want to know what a good approximation would be and to see how the evolutionary stuff compares to traditional methods.
It's pretty much the same process as any other optimization: profile, experiment, benchmark, repeat.
First you have to figure out what sections of your code are taking up the time. Then try different methods to speed them up (trying methods based on merit would be a better idea than trying things at random). Benchmark to find out if you actually did speed them up. If you did, replace the old method with the new one. Profile again.
I would recommend against a brute force approach if it's at all possible to do it some other way. But, here are some guidelines that should help you speed your code up either way.
There are many, many different optimizations you could apply to your code, but before you do anything, you should profile to figure out where the bottleneck is. Here are some profilers that should give you a good idea about where the hot spots are in your code:
GProf
PerfMon2
OProfile
HPCToolkit
These all use sampling to get their data, so the overhead of running them with your code should be minimal. Only GProf requires that you recompile your code. Also, the last three let you do both time and hardware performance counter profiles, so once you do a time (or CPU cycle) profile, you can zoom in on the hotter regions and find out why they might be running slow (cache misses, FP instruction counts, etc.).
Beyond that, it's a matter of thinking about how best to restructure your code, and this depends on what the problem is. It may be that you've just got a loop that the compiler doesn't optimize well, and you can inline or move things in/out of the loop to help the compiler out. Or, if you're running as fast as you can with basic arithmetic ops, you may want to try to exploit vector instructions (SSE, etc.) If your code is parallel, you might have load balance problems, and you may need to restructure your code so that data is better distributed across cores.
These are just a few examples. Performance optimization is complex, and it might not help you nearly enough if you're doing a brute force approach to begin with.
For more information on ways people have optimized things, there were some pretty good examples in the recent Why do you program in assembly? question.
If your optimization problem is (quasi-)convex or can be transformed into such a form, there are far more efficient algorithms than evolutionary search.
If you have large matrices, pay attention to your linear algebra routines. The right algorithm can make shave an order of magnitude off the computation time, especially if your matrices are sparse.
Think about how data is loaded into memory. Even when you think you're spending most of your time on pure arithmetic, you're actually spending a lot of time moving things between levels of cache etc. Do as much as you can with the data while it's in the fastest memory.
Try to avoid unnecessary memory allocation and de-allocation. Here's where it can make sense to back away from a purely OO approach.
This is more of a tip to find holes in the algorithm itself...
To realize maximum performance, simplify everything inside the most inner loop at the expense of everything else.
One example of keeping things simple is the classic bouncing ball animation. You can implement gravity by looking up the definition in your physics book and plugging in the numbers, or you can do it like this and save precious clock cycles:
initialize:
float y = 0; // y coordinate
float yi = 0; // incremental variable
loop:
y += yi;
yi += 0.001;
if (y > 10)
yi = -yi;
But now let's say you're having to do this with nested loops in an N-body simulation where every particle is attracted to every other particle. This can be an enormously processor intensive task when you're dealing with thousands of particles.
You should of course take the same approach as to simplify everything inside the most inner loop. But more than that, at the very simplest level you should also use data types wisely. For example, math operations are faster when working with integers than floating point variables. Also, addition is faster than multiplication, and multiplication is faster than division.
So with all of that in mind, you should be able to simplify the most inner loop using primarily addition and multiplication of integers. And then any scaling down you might need to do can be done afterwards. To take the y and yi example, if yi is an integer that you modify inside the inner loop then you could scale it down after the loop like this:
y += yi * 0.01;
These are very basic low-level performance tips, but they're all things I try to keep in mind whenever I'm working with processor intensive algorithms. Of course, if you then take these ideas and apply them to parallel processing on a GPU then you can take your algorithm to a whole new level. =)
Well how you do this depends the most on which language
you are using. Still, the key in any language
in the profiler. Profile your code. See which
functions/operations are taking the most time and then determine
if you can make these costly operations more efficient.
Standard bottlenecks in numerical algorithms are memory
usage (do you access matrices in the order which the elements
are stored in memory); communication overhead, etc. They
can be little different than other non-numerical programs.
Moreover, many other factors such as preconditioning, etc.
can lead to drastically difference performance behavior
of the SAME algorithm on the same problem. Make sure
you determine optimal parameters for your implementations.
As for comparing different algorithms, I recommend
reading the paper
"Benchmarking optimization software with performance profiles,"
Jorge Moré and Elizabeth D. Dolan, Mathematical Programming 91 (2002), 201-213.
It provides a nice, uniform way to compare different algorithms being
applied to the same problem set. It really should be better known
outside of the optimization community (in my not so humble opinion
at least).
Good luck!