Searching for Genetic Programming framework/library - algorithm

I am looking for framework, or library that could enable working with genetic programming (koza's style) not only by using mathematical functions, but also with loops, variable or constant assignment, object creations, or functions calling. I am not sure if there exists such branch of genetic algorithms and if it has a name.
I did my best in looking for informations, though the internet is poor with information on that specific topic.

HeuristicLab has a powerful implementation of Genetic Programming. It includes problems such as Symbolic Regression, Symbolic Classification, Time Series, Santa Fe Ant Trail, and there is a tutorial to implement custom problems such as the Lawn Mower (which is similar to the Santa Fe Ant Trail). HeuristicLab is implemented in C# and runs on Windows. It's released under GPL and can be freely downloaded.
The implementation of GP is very flexible and extensible, but also performance optimized using online calculations to avoid array allocation and memory overheads. We do include several benchmark problem instances for symbolic regression and classification. There are also more algorithms available such as Random Forests, Neural Networks, k-NN, SVM (if you're doing regression or classification).

Related

parallel iterative algorithms for solving Linear System of Equations

Does someone know any library or ready source code of parallel implementation of quick iterative methods (bicgstab, CG, etc) for solving Linear System of Equations for example using MPI or OpenMP?
PetSC is a good example (both serial and MPI, and with a large library of linear and nonlinear solvers either included or provided as interfaces to external libraries). Trillinos is another example, but it's a much broader project and not as nicely integrated as PetSC. Aztec has a number of solvers, as does Hypre, which is hybrid (MPI+OpenMP).
These are all MPI-based at least in part; I don't know of too many OpenMP-enabled ones, although google suggests Lis, which I'm not familiar with.
Chapter 7 of Parallel Programming for Multicore and Cluster Systems contains algorithms for systems of linear equations, with source code (MPI).

state-of-the-art of classification algorithms

We know there are like a thousand of classifiers, recently I was told that, some people say adaboost is like the out of the shell one.
Are There better algorithms (with
that voting idea)
What is the state of the art in
the classifiers.Do you have an example?
First, adaboost is a meta-algorithm which is used in conjunction with (on top of) your favorite classifier. Second, classifiers which work well in one problem domain often don't work well in another. See the No Free Lunch wikipedia page. So, there is not going to be AN answer to your question. Still, it might be interesting to know what people are using in practice.
Weka and Mahout aren't algorithms... they're machine learning libraries. They include implementations of a wide range of algorithms. So, your best bet is to pick a library and try a few different algorithms to see which one works best for your particular problem (where "works best" is going to be a function of training cost, classification cost, and classification accuracy).
If it were me, I'd start with naive Bayes, k-nearest neighbors, and support vector machines. They represent well-established, well-understood methods with very different tradeoffs. Naive Bayes is cheap, but not especially accurate. K-NN is cheap during training but (can be) expensive during classification, and while it's usually very accurate it can be susceptible to overtraining. SVMs are expensive to train and have lots of meta-parameters to tweak, but they are cheap to apply and generally at least as accurate as k-NN.
If you tell us more about the problem you're trying to solve, we may be able to give more focused advice. But if you're just looking for the One True Algorithm, there isn't one -- the No Free Lunch theorem guarantees that.
Apache Mahout (open source, java) seems to pick up a lot of steam.
Weka is a very popular and stable Machine Learning library. It has been around for quite a while and written in Java.
Hastie et al. (2013, The Elements of Statistical Learning) conclude that the Gradient Boosting Machine is the best "off-the-shelf" Method. Independent of the Problem you have.
Definition (see page 352):
An “off-the-shelf” method is one that
can be directly applied to the data without requiring a great deal of timeconsuming data preprocessing or careful tuning of the learning procedure.
And a bit older meaning:
In fact, Breiman (NIPS Workshop, 1996) referred to AdaBoost with trees as the “best off-the-shelf classifier in the world” (see also Breiman (1998)).

Biologically inspired software

I'm wondering if anyone knows of any software techniques taking advantage of biology? For example, in the robotics world, there are tons, but what about software?
Many concepts originally observed in biology have been used in software. For example Genetic Algorithm (GA).
Artificial life (AL) exposes/uses several principles of biology such as resilience to imperfect code snippets, addressing by content, imperfect reproduction (in some implementations, also sexual, i.e. multi-orginanisms-driven, reproduction) and a non-goal-driven utility function. An interesting result of AL, is the spontaneous production of macro phenomenons observed in domains such as ecology or epidemiology (domains largely influenced by biology), such as the emergence of parasites and even that of organisms which take advantage of parasites, or subtle predator-prey relationships.
Maybe software can be said to have gone "full circle" with some experiments in computing which involve real (carbon-based) DNA (or RNA) molecules! The original experiment in this area (PDF link) by Prof. Alderman (of RSA fame), who coded the various elements of a graph-related problem (an hamiltonian graph) with different DNA molecules and let the massive parallel computing power of bio-chemistry do the rest and solve the problem !
Back in the digital world, but with a strong inspiration from biology and indeed from anatomy of the cerebral cortex, and from many theoretical and clinical observations in the neuroscience field, we have Neural Networks (NN). In the area of NN, maybe worthy of a special notice, is Numenta's Hierarchical Temporal Memory model which, although it reproduces the [understanding we have of] the neo-cortex only very loosely, introduces the idea that the very same algorithm is applied in all areas and at all levels of the cognitive process powered by the brains, an idea largely supported by biological, anatomical and other forms of evidence.
If your question means "have biological ideas been used to optimize software?" then
Genetic programming (http://en.wikipedia.org/wiki/Genetic_programming) is one example. From the Wikipedia article:
In artificial intelligence, genetic programming (GP) is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user-defined task. It is a specialization of genetic algorithms (GA) where each individual is a computer program. Therefore it is a machine learning technique used to optimize a population of computer programs according to a fitness landscape determined by a program's ability to perform a given computational task.
If your question means "what software techniques have been inspired by biology?" then
see more generally http://en.wikipedia.org/wiki/Bio-inspired_computing. I would expect that several other methods such as ant-swarms (http://en.wikipedia.org/wiki/Ant_colony_optimization) and Neural Networks (http://en.wikipedia.org/wiki/Neural_network_software) could also be used.
Artificial Neural Networks are another classic example. The software application tends to be pattern recognition and prediction of behaviour of complex systems.
Ant colony optimization, a search / optimization method, and Artificial Life like Conway's Game of Life
Most of the answers yet talk about AI. The title of your question hints towards software that hides itself in order not to be detected.
We got viruses.
We got virus-hunters...
Me myself, I even hid some bugs in my own programs ... :(
Alan Kay (the object technology pioneer) spoke at length about the influence of biology in the OOP paradigm. He's got a series of ideas about how objects are like "cells" and that OOP scales in a similar way to the way that cells can scale to produce massive architectures...
You can follow quite a bit of this in his Turing Award Speech:
http://video.google.com/videoplay?docid=-2950949730059754521# -- Skip to about the 30:55 mark

Have you ever used a genetic algorithm in real-world applications?

I was wondering how common it is to find genetic algorithm approaches in commercial code.
It always seemed to me that some kinds of schedulers could benefit from a GA engine, as a supplement to the main algorithm.
Genetic Algorithms have been widely used commercially. Optimizing train routing was an early application. More recently fighter planes have used GAs to optimize wing designs. I have used GAs extensively at work to generate solutions to problems that have an extremely large search space.
Many problems are unlikely to benefit from GAs. I disagree with Thomas that they are too hard to understand. A GA is actually very simple. We found that there is a huge amount of knowledge to be gained from optimizing the GA to a particular problem that might be difficult and as always managing large amounts of parallel computation continue to be a problem for many programmers.
A problem that would benefit from a GA is going to have the following characteristics:
A good way to encode potential solutions
A way to compute an a numerical score to evaluate the quality of the solution
A large multi-dimensional search space where the answer is non-obvious
A good solution is good enough and a perfect solution is not required
There are many problems that could probably benefit from GAs and in the future they will probably be more widely deployed. I believe that GAs are used in cutting edge engineering more than people think however most people (like my company does) guards those secrets extremely closely. It is only long after the fact that it is revealed that GAs were used.
Most people that deal with "normal" applications probably don't have much use for them though.
If you want to find an example, look at Postgres's Query Planner. It uses many techniques, and one just so happens to be genetic.
http://developer.postgresql.org/pgdocs/postgres/geqo-pg-intro.html
I used GA in my Master's thesis, but after that I haven't found anything in my daily work a GA could solve that I couldn't solve faster with some other Algorithm.
I don't think it is particularly common to find genetic algorithms in everyday-commercial code. They are more commonly found in academic/research code where the need to find the "best algorithm" is less important than the need to just find a good solution to a problem.
Nonetheless, I have consulted on a couple of commercial projects that do use GAs (chiefly as a result of my involvement with GAUL). I think the most interesting example was at a Biotech company. They used the GA to optimise scoring functions that were used for virtual screening, as part of their drug discovery application.
Earlier this year, with my current company, I added a new feature to one of our products that uses another GA. I think we might be marketing this from next month. Basically, the GA is used to explore molecules that have the potential for binding to a protein, and could therefore be further investigated as drugs targeting that protein. A competing product that also uses a GA is EA inventor.
As part of my thesis I wrote a generic java framework for the multi-objective optimisation algorithm mPOEMS (Multiobjective prototype optimization with evolved improvement steps), which is a GA using evolutionary concepts. It is generic in a way that all problem-independent parts have been separated from the problem-dependent parts, and an interface is povided to use the framework with only adding the problem-dependent parts. Thus one who wants to use the algorithm does not have to begin from zero, and it facilitates work a lot.
You can find the code here.
The solutions which you can find with this algorithm have been compared in a scientific work with state-of-the-art algorithms SPEA-2 and NSGA, and it has been proven that
the algorithm performes comparable or even better, depending on the metrics you take to measure the performance, and especially depending on the optimization-problem you are looking on.
You can find it here.
Also as part of my thesis and proof of work I applied this framework to the project selection problem found in portfolio management. It is about selecting the projects which add the most value to the company, support most the strategy of the company or support any other arbitrary goal. E.g. selection of a certain number of projects from a specific category, or maximization of project synergies, ...
My thesis which applies this framework to the project selection problem:
http://www.ub.tuwien.ac.at/dipl/2008/AC05038968.pdf
After that I worked in a portfolio management department in one of the fortune 500, where they used a commercial software which also applied a GA to the project selection problem / portfolio optimization.
Further resources:
The documentation of the framework:
http://thomaskremmel.com/mpoems/mpoems_in_java_documentation.pdf
mPOEMS presentation paper:
http://portal.acm.org/citation.cfm?id=1792634.1792653
Actually with a bit of enthusiasm everybody could easily adapt the code of the generic framework to an arbitrary multi-objective optimisation problem.
I haven't but I've heard of this company (can't remember their name) which uses mutating, genetic algos to calculate placements and lengths of antennas (or something) from a friend of mine. And they're supposed to (according to my friend) have huge success with this. I guess GA is just too complex for "average Joe developer" to become mainstream. Kind of like Map Reduce - spectacularly cool, but WAY too advanced to hit the "mainstream"...
LibreOffice Calc uses it in its Solver module.

Fast FEM Solvers

What are the fast solvers for FEM equations? I would prefer open source implementation, but if there is a commercial implementation, then I won't mind paying for it.
Code Aster is an open source FE code. code aster
The pre- and post-processing is usually done with Salome - both originate from EDF.
How about FEAP. It has full source code available when you purchase it. It is pretty big project, maybe its too much for your needs, but check it out.
FEAP is a general purpose finite
element analysis program which is
designed for research and educational
use. Source code of the full program
is available for compilation using
Windows (Compaq or Intel compiler),
LINUX or UNIX operating systems, and
Mac OS X based Apple systems.
It has also a Personal Edition called FEAPpv available for free, including source code. Differences between those versions are listed in this pdf.
"brad"? do you mean "broad"?
you don't say if your problem is linear or non-linear. that'll make a very big difference.
the solver depends on the type of equation and the size of your problem. for elliptical pdes you can choose standard linear algebra techniques like lu decomposition, iterative methods like successive over relaxation, or wavefront solvers that minimize memory consumption.
some people like solving non-linear steady-state problems as if they were dynamics problems. the idea is to create "fake" mass and damping matricies and use explicit time integration to converge to steady state.
lots of choices. standard linear algebra is a good starting point.
language? java?
Oops, that's kind of a brad question.
Solving differential equations usually starts with analyzing equation itself. Some equations are notoriously difficult to solve efficiently, e.g. indifinite boundary problems.
So if you have something else than an elliptic problem, you'll might better prepare for hard times ahead.
Next important and crutial part is transfering the contiouus problem into a discrete mesh. Typically the accuracy of your results will vary with different ways to generate this mesh. You'll need some sound experience here.
So I'd say there is nothing like the fast slover for FEM equations. Anyway, while Wikipedia gives a short overview of the topic, you might perhaps also have a look a the german Wikipedia page. It lists well-known FEM implementations.
OpenFoam and Elmer are two open source solvers. Not sure about Elmer, but I think OpenFoam might uses the control volume approach.
I used OpenFOAM for fluid dynamics research. You can do parallel processing with it with MPI. And if you have a Cray T3E it will be fast!
It's open source :D
http://www.opencfd.co.uk/openfoam/features.html#features
Please have look for Deal.II open source library:
http://www.dealii.org/
They provide also VirtualBox image which comes pre-installed libs.

Resources