Symbolic vs Numeric Math - Performance - algorithm

Do symbolic math calculations (especially for solving nonlinear polynomial systems) cause huge performance (calculation speed) disadvantage compared to numeric calculations? Are there any benchmark/data about this?
Found a related question: Symbolic computation vs. numerical computation
Another one: Computational Efficiency of Forward Mode Automatic vs Numeric vs Symbolic Differentiation

I am the individual who answered the Scicomp question you reference in your question. I personally am not aware of any empirical metrics performed to compare run-time performance for symbolic versus numerical solutions to systems of polynomial equations.
However, it should be fairly intuitive that symbolic solutions will have a bit more overhead for most aspects of solving the problem due to things such as manipulation of terms in the equation symbolically, searching how to simplify/rearrange equations to make them easier to solve, searching through known closed form solutions, etc. One major issue with symbolic solvers is that you may not have a closed form solution you can find and use, so solving it numerically would have to happen either way.
The only way I can see symbolic solvers outperforming numerical solutions in terms of run-time is if the symbolic solver can quickly enough recognize your problem as one with a known analytical solution or if it arrives at the solution eventually while the numerical solver never does (aka it diverges).
Given you can find a numerical solver that converges, I think the numerical case will generally be much more efficient since there's just much less overhead to make progress in refining your solution. Since you mention solving systems of polynomial equations, I suspect there are also some tailored algorithms for your type of problem that may be superior to typical nonlinear equation solving schemes.

This is not a direct answer to the question but a suggested course correction.
While it is possible to evaluate math expressions in a purely numeric means or in a purely symbolic means, it is also possible to use a hybrid approach.
This is know as Symbolic-numeric computation
Maple is one software package that has this ability.
Note: I have never used Maple so I can't add more.
Searching for packages
I find I get better results when searching for math packages that use symbolic-numeric computation by searching for the name of the package combined with Symbolic-numeric computation, e.g.
wolfram symbolic-numeric computation
A specific example related to neural networks
In the world of neural networks one has to be able to calculate the derivative, however if a derivate can be simplified before calculating then the cost of calculating goes down. Since simplifying the derivative is a one time action while the cost of calculating occurs thousands to millions of times, the simplification is done symbolically and then the calculation is done numerically. Theano is a software package that does this specifically for use with neural networks.

Related

A good parameter optimization algorithm for a limited number of points with variance

I'm trying to meta-optimize an algorithm, which has almost a dosen constants. I guess some form of genetic algorithm should be used. However, the algorithm itself is quite heavy and probabilistic by nature (a version of ant colony optimization). Thus calculating the fitness for some set of parameters is quite slow and the results include a lot of variance. Even the order of magnitude for some of the parameters is not exactly clear, so the distribution on some components will likely need to be logarithmic.
Would someone have ideas about suitable algorithms for this problem? I.e. it would need to converge with a limited number of measurement points and also be able to handle randomness in the measured fitness. Also, the easier it is to implement with Java the better of course. :)
If you can express you model algebraically (or as differential equations), consider trying a derivative-based optimization methods. These have the theoretical properties you desire and are much more computationally efficient than black-box/derivative free optimization methods. If you have a MATLAB license, give fmincon a try. Note: fmincon will work much better if you supply derivative information. Other modeling environments include Pyomo, CasADi and Julia/JuMP, which will automatically calculate derivatives and interface with powerful optimization solvers.

What is the quickest method of matrix multiplication?

I've been working on a rather extensive program as of late, and i'm currently at a point where I have to utilize matrix multiplication. Thing is, for this particular program, speed is crucial. I'm familiar with a number of matrix setups, but I would like to know which method will run the fastest. I've done extensive research, but turned up very little results. Here is a list of the matrix multiplication algorithms I am familiar with:
Iterative algorithm
Divide and Conquer algorithm
Sub Cubic algorithms
Shared Memory Parallelism
If anyone needs clarification on the methods I listed, or on the question in general, feel free to ask.
The Strassen algorithm and the naive (O(n^3)) one are the most used in practice.
More complex algorithms with tighter asymptotic bounds are not used because they benefits would be apparent only for extremely large matrices, due to their complexity, e.g. Coppersmith algorithm.
As others pointed out you might want to use a library like ATLAS which will automatically tune the algorithm depending on the characteristcs of the platform where you are executing, e.g. L1/L2 cache sizes.
Quickest way might be using an existing library that's already optimized, you don't have to reinvent the wheel every time.

optimum finding in Genetic algorithms

I am implementing my M.Sc dissertation and in theory aspect of my thesis, i have a big problem.
suppose we want to use genetic algorithms.
we have 2 kind of functions :
a) some functions that have relations like this : ||x1 - x2||>>||f(x1) - f(x2)||
for example : y=(1/10)x^2
b) some functions that have relations like this : ||x1 - x2||<<||f(x1) - f(x2)||
for example : y=x^2
my question is that which of the above kind of functions have more difficulties than other when we want to use genetic algorithms to find optimum ( never mind MINIMUM or MAXIMUM ).
Thank you a lot,
Armin
I don't believe you can answer this question in general without imposing additional constraints.
It's going to depend on the particular type of genetic algorithm you're dealing with. If you use fitness proportional (roulette-wheel) selection, then altering the range of fitness values can matter a great deal. With tournament selection or rank-biased selection, as long as the ordering relations hold between individuals, there will be no effects.
Even if you can say that it does matter, it's still going to be difficult to say which version is harder for the GA. The main effect will be on selection pressure, which causes the algorithm to converge more or less quickly. Is that good or bad? It depends. For a function like f(x)=x^2, converging as fast as possible is probably great, because there's only one optimum, so find it as soon as possible. For a more complex function, slower convergence can be required to find good solutions. So for any given function, scaling and/or translating the fitness values may or may not make a difference, and if it does, the difference may or may not be helpful.
There's probably also a No Free Lunch argument that no single best choice exists over all problems and optimization algorithms.
I'd be happy to be corrected, but I don't believe you can say one way or the other without specifying much more precisely exactly what class of algorithms and problems you're focusing on.

Which optimization algorithm should I use to optimize the weights of a multilayer perceptron?

Actually these are 3 questions:
Which optimization algorithm should I use to optimize the weights of a multilayer perceptron, if I knew...
1) only the value of the error function? (blackbox)
2) the gradient? (first derivative)
3) the gradient and the hessian? (second derivative)
I heard CMA-ES should work very well for 1) and BFGS for 2) but I would like to know if there are any alternatives and I don't know wich algorithm to take for 3).
Ok, so this doesn't really answer the question you initially asked, but it does provide a solution to the problem you mentioned in the comments.
Problems like dealing with a continuous action space are normally not dealt with via changing the error measure, but rather by changing the architecture of the overall network. This allows you to keep using the same highly informative error information while still solving the problem you want to solve.
Some possible architectural changes that could accomplish this are discussed in the solutions to this question. In my opinion, I'd suggest using a modified Q-learning technique where the state and action spaces are both represented by self organizing maps, which is discussed in a paper mentioned in the above link.
I hope this helps.
I solved this problem finally: there are some efficient algorithms for optimizing neural networks in reinforcement learning (with fixed topology), e. g. CMA-ES (CMA-NeuroES) or CoSyNE.
The best optimization algorithm for supervised learning seems to be Levenberg-Marquardt (LMA). This is an algorithm that is specifically designed for least square problems. When there are many connections and weights, LMA does not work very well because the required space is huge. In this case I am using Conjugate Gradient (CG).
The hessian matrix does not accelerate optimization. Algorithms that approximate the 2nd derivative are faster and more efficient (BFGS, CG, LMA).
edit: For large scale learning problems often Stochastic Gradient Descent (SGD) outperforms all other algorithms.

Efficiency of crossover in genetic algorithms

I've implemented a number of genetic algorithms to solve a variety of a problems. However I'm still skeptical of the usefulness of crossover/recombination.
I usually first implement mutation before implementing crossover. And after I implement crossover, I don't typically see a significant speed-up in the rate at which a good candidate solution is generated compared to simply using mutation and introducing a few random individuals in each generation to ensure genetic .
Of course, this may be attributed to poor choices of the crossover function and/or the probabilities, but I'd like to get some concrete explanation/evidence as to why/whether or not crossover improves GAs. Have there been any studies regarding this?
I understand the reasoning behind it: crossover allows the strengths of two individuals to be combined into one individual. But to me that's like saying we can mate a scientist and a jaguar to get a smart and fast hybrid.
EDIT: In mcdowella's answer, he mentioned how finding a case where cross-over can improve upon hill-climbing from multiple start points is non-trivial. Could someone elaborate upon this point?
It strongly depends on the smoothness of your search space. Perverse example if every "geneome" was hashed before being used to generate "phenomes" then you would just be doing random search.
Less extreme case, this is why we often gray-code integers in GAs.
You need to tailor your crossover and mutation functions to the encoding. GAs decay quite easily if you throw unsympathetic calculations at them. If the crossover of A and B doesn't yield something that's both A-like and B-like then it's useless.
Example:
The genome is 3 bits long, bit 0 determines whether it's land-dwelling or sea-dwelling. Bits 1-2 describe digestive functions for land-dwelling creatures and visual capabilities for sea-dwelling creatures.
Consider two land-dwelling creatures.
| bit 0 | bit 1 | bit 2
----+-------+-------+-------
Mum | 0 | 0 | 1
Dad | 0 | 1 | 0
They might crossover between bits 1 and 2 yielding a child whose digestive function is some compromise between Mum's and Dad's. Great.
This crossover seems sensible provided that bit 0 hasn't changed. If is does then your crossover function has turned some kind of guts into some kind of eyes. Er... Wut? It might as well have been a random mutations.
Begs the question how DNA gets around this problem. Well, it's both modal and hierarchial. There are large sections which can change a lot without much effect, in others a single mutation can have drastic effects (like bit 0 above). Sometimes the value of X affects the behaviour tiggered by Y, and all values of X are legal and can be explored whereas a modification to Y makes the animal segfault.
Theoretical analyses of GAs often use extremely crude encodings and they suffer more from numerical issues than semantic ones.
You are correct in being skeptical about the cross-over operation. There is a paper called
"On the effectiveness of crossover in simulated evolutionary optimization" (Fogel and Stayton, Biosystems 1994). It is available for free at 1.
By the way, if you haven't already I recommend looking into a technique called "Differential Evolution". It can be very good at solving many optimization problems.
My impression is that hill-climbing from multiple random starts is very effective, but that trying to find a case where cross-over can improve on this is non-trivial. One reference is "Crossover: The Divine Afflatus in Search" by David Icl˘anzan, which states
The traditional GA theory is pillared on the Building Block Hypothesis
(BBH) which states that Genetic Algorithms (GAs) work by discovering,
emphasizing and recombining low order schemata in high-quality
strings, in a strongly parallel manner. Historically, attempts to
capture the topological fitness landscape features which exemplify
this intuitively straight-forward process, have been mostly
unsuccessful. Population-based recombinative methods had been
repeatedly outperformed on the special designed abstract test suites,
by different variants of mutation-based algorithms.
A related paper is "Overcoming Hierarchical Difficulty by Hill-Climbing the
Building Block Structure" by David Iclănzan and Dan Dumitrescu, which states
The Building Block Hypothesis suggests that Genetic Algorithms (GAs)
are well-suited for hierarchical problems, where efficient solving
requires proper problem decomposition and assembly of solution from
sub-solution with strong non-linear interdependencies. The paper
proposes a hill-climber operating over the building block (BB) space
that can efficiently address hierarchical problems.
John Holland's two seminal works "Adaptation in Natural and Artificial Systems" and "Hidden Order" (less formal) discuss the theory of crossover in depth. IMO, Goldberg's "Genetic Algorithms in Search, Optimization, and Machine Learning" has a very approachable chapter on mathematical foundations which includes such conclusions as:
With both crossover and reproduction....those schemata with both above-average performance and short defining lengths are going to be sampled at exponentially increasing rates.
Another good reference might be Ankenbrandt's "An Extension to the Theory of Convergence and a Proof of the Time Complexity of Genetic Algorithms" (in "Foundations of Genetic Algorithms" by Rawlins).
I'm surprised that the power of crossover has not been apparent to you in your work; when I began using genetic algorithms and saw how powerfully "directed" crossover seemed, I felt I gained an insight into evolution that overturned what I had been taught in school. All the questions about "how could mutation lead to this and that?" and "Well, over the course of so many generations..." came to seem fundamentally misguided.
The crossover and mutation!! Actually both of them are necessary.
Crossover is an explorative operator, but the mutation is an exploitive one. Considering the structure of solutions, problem, and the likelihood of optimization rate, its very important to select a correct value for Pc and Pm (probability of crossover and mutation).
Check this GA-TSP-Solver, it uses many crossover and mutation methods. You can test any crossover alongside mutations with given probabilities.
it mainly depends on the search space and the type of crossover you are using. For some problems I found that using crossover at the beginning and then mutation, it will speed up the process for finding a solution, however this is not very good approach since I will end up on finding similar solutions. If we use both crossover and mutation I usually get better optimized solutions. However for some problems crossover can be very destructive.
Also genetic operators alone are not enough to solve large/complex problems. When your operators don't improve your solution (so when they don't increase the value of fitness), you should start considering other solutions such as incremental evolution, etc..

Resources