Scaling on constraints and variables for solving NLP problems, using CONOPT4 - solver

I am currently using CONOPT4 solver to solve nonlinear programming problem.The nonlinearity is in the form of z=x*y and z=x/y, all variables are continuous. I specify some scaling factors and solving performance improves a lot. However, when I further refine some scaling factors, to project the value into the range from 0.01 to 100. The solving time becomes longer, which is really weird. I cannot provide my code here and I know it's impossible to give specific reason without the code. Could you talk about your experience on generally tuning the scaling factors when using CONOPT sovler. Thanks a lot.

Related

Python: How to solve DAE with Jacobian efficiently?

I am trying to use the Assimulo package to solve a set of differential algebraic equations (DAEs). I am trying to use an algorithm similar to that shown here. However, there does not seem to be an option to pass in a sparse matrix. My Jacobian matrix is very large, approximately 3000 x 3000. Do you know if there is a way to solve my DAEs more computationally efficiently?
In my experience with sparse ODE systems (more precisely with systems of semi-discretized PDEs), using an iterative linear solver greatly enhances numerical efficiency. As far as I know, Assimulo doesn't allow to provide a jacobian sparsity pattern, but changing the linear solver is another way to tackle this.
You would do something like:
model = Explicit_Problem(ode_function, y0=y_init, t0=t_init)
simulator = CVode(model)
sim.linear_solver = 'SPGMR'
I'm not sure if this also applies for DAE systems, but I think it's worth giving this a try.

Why should we compute the image mean when we train CNNs?

When I use caffe for image classification, it often computes the image mean. Why is that the case?
Someone said that it can improve the accuracy, but I don't understand why this should be the case.
Refer to image whitening technique in Deep learning. Actually it has been proved that it improve the accuracy but not widely used.
To understand why it helps refer to the idea of normalizing data before applying machine learning method. which helps to keep the data in the same range. Actually there is another method now used in CNN which is Batch normalization.
Neural networks (including CNNs) are models with thousands of parameters which we try to optimize with gradient descent. Those models are able to fit a lot of different functions by having a non-linearity φ at their nodes. Without a non-linear activation function, the network collapses to a linear function in total. This means we need the non-linearity for most interesting problems.
Common choices for φ are the logistic function, tanh or ReLU. All of them have the most interesting region around 0. This is where the gradient either is big enough to learn quickly or where a non-linearity is at all in case of ReLU. Weight initialization schemes like Glorot initialization try to make the network start at a good point for the optimization. Other techniques like Batch Normalization also keep the mean of the nodes input around 0.
So you compute (and subtract) the mean of the image so that the first computing nodes get data which "behaves well". It has a mean of 0 and thus the intuition is that this helps the optimization process.
In theory, a network can be able to "subtract" the mean by itself. So if you train long enough, this should not matter too much. However, depending on the activation function "long enough" can be important.

How do I implement the Gaussian mutation method in my genetic algorithm?

I am doing a research paper on a specific genetic algorithm and wanted to analyse the influence of using the Gaussian mutation method. However, the only thing I understand is that I have to sample a random Gaussian value and add that to the gene I have read somewhere on the internet that the mean should be 0, which I understand; this gives us negative as well as positive values. However, I have not found a single source that gave an example of what the std. dev should be or how it should be calculated.
Does anyone know how the standard deviation is determined using Gaussian mutation method so I can get a value from it?
I have read the question and answers of this question here on StackOverflow, but it does not provide me with any details regarding my problem.
What a reasonable (or even the optimal)mutation strength is, depends on the problem to be solved.
Usually you apply Genetic Algorithms to very hard optimization problems for that usual optimization algorithms fail. You can imagine the possible „solutions“ to such an optimization problem as a fitness landscape with high peaks for good solutions and valleys for bad ones.
So if your problem corresponds to a landscape with many peaks of similar height spread out widely (how do you know?), you should use a broad Gaussian distribution so that your chance to find the maximum peak is higher. If you believe however that you have already a pretty good solution (whatever this is) you could use a smaller distribution to find the maximum faster.
So a reasonable approach is to start with a broad distribution and let the population develop towards a (local) maximum by reducing the distribution width.
Again, the concrete numerical values must be derived from the problem.
EDIT:
If you want to play a little with the effects, you could download my free iPhone/iPad app "Steinertree" that shows the effects of varying mutation strength and population size.

A good parameter optimization algorithm for a limited number of points with variance

I'm trying to meta-optimize an algorithm, which has almost a dosen constants. I guess some form of genetic algorithm should be used. However, the algorithm itself is quite heavy and probabilistic by nature (a version of ant colony optimization). Thus calculating the fitness for some set of parameters is quite slow and the results include a lot of variance. Even the order of magnitude for some of the parameters is not exactly clear, so the distribution on some components will likely need to be logarithmic.
Would someone have ideas about suitable algorithms for this problem? I.e. it would need to converge with a limited number of measurement points and also be able to handle randomness in the measured fitness. Also, the easier it is to implement with Java the better of course. :)
If you can express you model algebraically (or as differential equations), consider trying a derivative-based optimization methods. These have the theoretical properties you desire and are much more computationally efficient than black-box/derivative free optimization methods. If you have a MATLAB license, give fmincon a try. Note: fmincon will work much better if you supply derivative information. Other modeling environments include Pyomo, CasADi and Julia/JuMP, which will automatically calculate derivatives and interface with powerful optimization solvers.

Performance Testing for Calculation-Heavy Programs

What are some good tips and/or techniques for optimizing and improving the performance of calculation heavy programs. I'm talking about things like complication graphics calculations or mathematical and simulation types of programming where every second saved is useful, as opposed to IO heavy programs where only a certain amount of speedup is helpful.
While changing the algorithm is frequently mentioned as the most effective method here,I'm trying to find out how effective different algorithms are in the first place, so I want to create as much efficiency with each algorithm as is possible. The "problem" I'm solving isn't something thats well known, so there are few if any algorithms on the web, but I'm looking for any good advice on how to proceed and what to look for.
I am exploring the differences in effectiveness between evolutionary algorithms and more straightforward approaches for a particular group of related problems. I have written three evolutionary algorithms for the problem already and now I have written an brute force technique that I am trying to make as fast as possible.
Edit: To specify a bit more. I am using C# and my algorithms all revolve around calculating and solving constraint type problems for expressions (using expression trees). By expressions I mean things like x^2 + 4 or anything else like that which would be parsed into an expression tree. My algorithms all create and manipulate these trees to try to find better approximations. But I wanted to put the question out there in a general way in case it would help anyone else.
I am trying to find out if it is possible to write a useful evolutionary algorithm for finding expressions that are a good approximation for various properties. Both because I want to know what a good approximation would be and to see how the evolutionary stuff compares to traditional methods.
It's pretty much the same process as any other optimization: profile, experiment, benchmark, repeat.
First you have to figure out what sections of your code are taking up the time. Then try different methods to speed them up (trying methods based on merit would be a better idea than trying things at random). Benchmark to find out if you actually did speed them up. If you did, replace the old method with the new one. Profile again.
I would recommend against a brute force approach if it's at all possible to do it some other way. But, here are some guidelines that should help you speed your code up either way.
There are many, many different optimizations you could apply to your code, but before you do anything, you should profile to figure out where the bottleneck is. Here are some profilers that should give you a good idea about where the hot spots are in your code:
GProf
PerfMon2
OProfile
HPCToolkit
These all use sampling to get their data, so the overhead of running them with your code should be minimal. Only GProf requires that you recompile your code. Also, the last three let you do both time and hardware performance counter profiles, so once you do a time (or CPU cycle) profile, you can zoom in on the hotter regions and find out why they might be running slow (cache misses, FP instruction counts, etc.).
Beyond that, it's a matter of thinking about how best to restructure your code, and this depends on what the problem is. It may be that you've just got a loop that the compiler doesn't optimize well, and you can inline or move things in/out of the loop to help the compiler out. Or, if you're running as fast as you can with basic arithmetic ops, you may want to try to exploit vector instructions (SSE, etc.) If your code is parallel, you might have load balance problems, and you may need to restructure your code so that data is better distributed across cores.
These are just a few examples. Performance optimization is complex, and it might not help you nearly enough if you're doing a brute force approach to begin with.
For more information on ways people have optimized things, there were some pretty good examples in the recent Why do you program in assembly? question.
If your optimization problem is (quasi-)convex or can be transformed into such a form, there are far more efficient algorithms than evolutionary search.
If you have large matrices, pay attention to your linear algebra routines. The right algorithm can make shave an order of magnitude off the computation time, especially if your matrices are sparse.
Think about how data is loaded into memory. Even when you think you're spending most of your time on pure arithmetic, you're actually spending a lot of time moving things between levels of cache etc. Do as much as you can with the data while it's in the fastest memory.
Try to avoid unnecessary memory allocation and de-allocation. Here's where it can make sense to back away from a purely OO approach.
This is more of a tip to find holes in the algorithm itself...
To realize maximum performance, simplify everything inside the most inner loop at the expense of everything else.
One example of keeping things simple is the classic bouncing ball animation. You can implement gravity by looking up the definition in your physics book and plugging in the numbers, or you can do it like this and save precious clock cycles:
initialize:
float y = 0; // y coordinate
float yi = 0; // incremental variable
loop:
y += yi;
yi += 0.001;
if (y > 10)
yi = -yi;
But now let's say you're having to do this with nested loops in an N-body simulation where every particle is attracted to every other particle. This can be an enormously processor intensive task when you're dealing with thousands of particles.
You should of course take the same approach as to simplify everything inside the most inner loop. But more than that, at the very simplest level you should also use data types wisely. For example, math operations are faster when working with integers than floating point variables. Also, addition is faster than multiplication, and multiplication is faster than division.
So with all of that in mind, you should be able to simplify the most inner loop using primarily addition and multiplication of integers. And then any scaling down you might need to do can be done afterwards. To take the y and yi example, if yi is an integer that you modify inside the inner loop then you could scale it down after the loop like this:
y += yi * 0.01;
These are very basic low-level performance tips, but they're all things I try to keep in mind whenever I'm working with processor intensive algorithms. Of course, if you then take these ideas and apply them to parallel processing on a GPU then you can take your algorithm to a whole new level. =)
Well how you do this depends the most on which language
you are using. Still, the key in any language
in the profiler. Profile your code. See which
functions/operations are taking the most time and then determine
if you can make these costly operations more efficient.
Standard bottlenecks in numerical algorithms are memory
usage (do you access matrices in the order which the elements
are stored in memory); communication overhead, etc. They
can be little different than other non-numerical programs.
Moreover, many other factors such as preconditioning, etc.
can lead to drastically difference performance behavior
of the SAME algorithm on the same problem. Make sure
you determine optimal parameters for your implementations.
As for comparing different algorithms, I recommend
reading the paper
"Benchmarking optimization software with performance profiles,"
Jorge Moré and Elizabeth D. Dolan, Mathematical Programming 91 (2002), 201-213.
It provides a nice, uniform way to compare different algorithms being
applied to the same problem set. It really should be better known
outside of the optimization community (in my not so humble opinion
at least).
Good luck!

Resources