Genetic Programming and Search Algorithms - algorithm

Is Genetic Programming currently capable of evolving one type of search algorithm into another? For example, has any experiment ever ever bred / mutated BubbleSort from QuickSort (see http://en.wikipedia.org/wiki/Sorting_algorithm)

You might want to look at the work of W. Daniel Hillis from the 80s. He spent a great deal of time creating sorting networks by genetic programming. While he was more interested in solving the problem of sorting a constant number of objects (16-object sorting networks had been a major academic problem for nearly a decade,) it would be a good idea to be familiar with his work if you're really interested in genetic sorting algorithms.
In the evolution of an algorithm for sorting a list of arbitrary length, you might also want to be familiar with the concept of co-evolution. I've built a co-evolutionary system before where the point was to have one genetic algorithm evolving sorting algorithms while another GA develops unsorted lists of numbers. The fitness of the sorter is its accuracy (plus a bonus for fewer comparisons if it is 100% accurate) and the fitness of the list generator is how many errors sort algorithms make in sorting its list.
To answer your specific question of whether bubble had ever been evolved from quick, I would have to say that I would seriously doubt it, unless the programmer's fitness function was both very specific and ill-advised. Yes, bubble is very simple, so maybe a GP whose fitness function was accuracy plus program size would eventually find bubble. However, why would a programmer select size instead of number of comparisons as a fitness function when it is the latter that determines runtime?
By asking if GP can evolve one algorithm into another, I'm wondering if you're entirely clear on what GP is. Ideally, each unique chromosome defines a unique sort. A population of 200 chromosomes represents 200 different algorithms. Yes, quick and bubble may be in there somewhere, but so are 198 other, potentially unnamed, methods.

There's no reason why GP couldn't evolve e.g. either type of algorithm. I'm not sure that it really makes sense to think of evolving one "into" the other. GP will simply evolve a program that comes ever-closer to a fitness function you define.
If your fitness function only looks at sort correctness (and assuming you have the proper building blocks for your GP to use) then it could very well evolve both BubbleSort and QuickSort. If you also include efficiency as a measure of fitness, then that might influence which of these would be determined as a better solution.
You could seed the GP with e.g. QuickSort and if you had an appropriate fitness function it certainly could eventually come up with BubbleSort - but it could come up with anything else that is fitter than QuickSort as well.
Now how long it takes the GP engine to do this evolution is another question...

I'm not aware of one, and the particular direction you're suggesting in your example seems unlikely; it would take a sort of perverse fitness function, since bubble sort is in most measures worse than quicksort. It's not inconceivable that this could happen, but in general once you've got a well-understood algorithm, it's already pretty fit -- going to another one probably requires passing through some worse choices.
Being trapped in local minima isn't an unknown problem for most search strategies.

Related

Why do we need so many sorting techniques?

There is a plethora of sorting techniques in data structure as follows -
Selection Sort
Bubble Sort
Recursive Bubble Sort
Insertion Sort
Recursive Insertion Sort
Merge Sort
Iterative Merge Sort
Quick Sort
Iterative Quick Sort
Heap Sort
Counting Sort
Radix Sort
Bucket Sort
Shell Sort
Tim Sort
Comb Sort
Pigeonhole Sort
Cycle Sort
Cocktail Sort
Strand Sort
and many more.
Do we need all of them?
There’s no single reason why so many different sorting algorithms exist. Here’s a sampler of sorting algorithms and where they came from, to give a better sense of their origins:
Radix sort was invented in the late 1800s for physically sorting punched cards for the US census. It’s still used today in software because it’s very fast on numeric and string data.
Merge sort appears to have been invented by John von Neumann to validate his stored-program computer model (the von Neumann architecture). It works well as a sorting algorithm for low-memory computers processing data that’s streamed through the machine, hence its popularity in the 1960s and 1970s. And it’s a great testbed for divide-and-conquer techniques, making it popular in algorithms classes.
Insertion sort seems to have been around forever. Even though it’s slow in the worst case, it’s fantastic on small inputs and mostly-sorted data and is used as a building block in other fast sorting algorithms.
Quicksort was invented in 1961. It plays excellently with processor caches, hence its continued popularity.
Sorting networks were studied extensively many years back. They’re still useful as building blocks in theoretical proof-of-concept algorithms like signature sort.
Timsort was invented for Python and was designed to sort practical, real-world sequences faster than other sorts by taking advantage of common distributions and patterns.
Introsort was invented as a practical way to harness the speed of quicksort without its worst-case behavior.
Shellsort was invented over fifty years ago and was practical on the computers of its age. Probing its theoretical limits was a difficult mathematical problem for folks who studied it back then.
Thorup and Yao’s O(n sqrt(log log n))-time integer sorting algorithm was designed to probe the theoretical limits of efficient algorithms using word-level parallelism.
Cycle sort derives from the study of permutations in group theory and is designed to minimize the number of memory writes made when sorting the list.
Heapsort is noteworthy for being in-place and yet fast in practice. It’s based on the idea of implicitly representing a nontrivial data structure.
This isn’t even close to an exhaustive list of sorting algorithms, but hopefully gives you a sense of what’s out there and why. :-)
The main reason sorting algorithms are discussed and studied in early computer science classes if because they provide very good studying material. The problem of sorting is simple, and a good excuse to present several algorithm strategies; several data structures; how to implement them; and discuss time complexity and space complexity; and different properties algorithms can have even if they apparently solve the same problem.
In practice, standard libraries for programming languages usually include a default sort function, such as std::sort in C++ or list.sort in python; and in almost every situation, you should trust that function and the algorithm it uses.
But everything you've learned about sorting algorithms is valuable and can be applied to other problems. Here is a non-exhaustive list of things that can be learned by studying sorting algorithms:
divide and conquer;
heaps;
binary search trees, including different types of self-balancing binary search trees;
the importance of choosing an appropriate data-structure;
difference between in-place and not-in-place;
difference between stable and non-stable sort;
recursive approach and iterative approach;
how to calculate the time complexity, and how to compare the efficiency of two algorithms;
Besides educational reasons, We need multiple sorting algorithms because they work best in a couple of situations, and none of them rules them all.
For example, although the mean time-complexity of quicksort is impressive, its performance on nearly sorted array is horrible.

How to decide if randomised algorithm is OK to use?

From what I understand, randomised algorithm could give wrong answer.For example, using contraction algorithm to solve graph min-cut problem, you need to run the algorithm n^2*ln(n) times so that the possibility of failing to get the correct answer is at most 1/n. No matter how small the possibility of failure is, the answer could be incorrect, so when is the right time that we allow the incorrect answer?
To begin with, I think you need to differentiate between different classes of randomized algorithms:
A Monte Carlo algorithm is an algorithm which is random w.r.t. correctness. The randomized min-cut algorithm, from your question, is an example of such an algorithm.
A Las Vegas algorithm is an algorithm which is random w.r.t. running time. Randomized quicksort, for example, is such an algorithm.
You seem to mean Monte-Carlo algorithms in your question.
The question of whether a Monte-Carlo algorithm is suitable to you, probably can't be answered objectively, because it is based on something like the ecomonic theory of utility. Given two algorithms, A and B, then each invocation of A or B takes some time t and gives you the result whose correctness is c. The utility U(t, c) is a random variable, and only you can determine whether the distribution of UA(T, C) is better or worse than UB(T, C). Some examples, where algorithm A performs twice as fast as B, but errs with probability 1e-6:
If these are preference recommendations on a website, then it might be worth it for you to have your website twice as responsive as that of a competitor, at the risk that, rarely, a client gets wrong recommendations.
If these are control systems for a nuclear reactor (to borrow from TemplateTypedef's comment), then a slight chance of failure might not be worth the time saving (e.g., you probably would be better investing in a processor twice as fast running the slower algorithm).
The two examples above show that each of the two choices might be correct for different settings. In fact, utility theory rarely shows sets of choices that are clearly wrong. In the introduction to the book Randomized Algorithms by Motwani and Raghavan, however, the authors do give such an example for the fallacy of avoiding Monte-Carlo algorithms. The probability of a CPU malfunctioning due to cosmic radiation is some α (whose value I forget). Thus avoiding running a Monte-Carlo algorithm with probability of error much lower than α, is probably simply irrational.
You'll always need to analyze the properties of the algorithm and decide if the risk of a non-optimal answer is bearable in your application. (If the answer is Boolean, then "non-optimal" is the same as "wrong.")
There are many kinds of programming problems where some answer that's close to optimal and obtained in reasonable time is much better than the optimal answer provided too late or not at all.
The Traveling Salesman problem is an example. If you are Walmart and need to plan delivery routes each night for given sets of cities, getting a route that's close to optimal is much better than no route or a naively chosen one or the best possible route obtained 2 days from now.
There are many kinds of guarantees provided by randomized algorithms. They often have the form error <= F(cost), where error and cost can be almost anything. The cost may be expressed in run time or how many repeat runs are spent looking for better answers. Space may also figure in cost. The error may be probability of a wrong 1/0 answer, a distance metric from an optimal result, a discrete count of erroneous components, etc., etc.
Sometimes you just have to live with a maybe-wrong answer because there's no useful alternative. Primality testing on big numbers is in this category. Though there are polynomial time deterministic tests, they are still much slower than a probabilistic test that produces the correct answer for all practical purposes.
For example, if you have a Boolean randomized function where True results are always correct, but False are wrong 50% of the time, then you are in pretty good shape. (The Miller-Rabin primality test is actually better than this.)
Suppose you can afford to run the algorithm 40 times. If any of the runs says False, you know the answer is False. If they're all True then the probability of that the real answer if false is roughly 2^40 = 1/(1 trillion).
Even in safety-critical applications, this may be a fine result. The chance of being hit by lightning in a lifetime is about 1/10,000. We all live with that and don't give it a second thought.

Are all genetic algorithms maximization algorithms?

I'm not sure if my understanding of maximization and minimization is correct.
So lets say for some function f(x,y,z) I want to find what would give the highest value that would be maximization, right? And if I wanted to find the lowest value that would be minimization?
So if a genetic algorithm is a search algorithm trying to maximize some fitness function would they by definition be maximization algorithms?
So let's say for some function f(x,y,z), I want to find what would give the highest value that would be maximization, right? And if I wanted to find the lowest value that would be minimization?
Yes, that's by definition true.
So if a genetic algorithm is a search algorithm trying to maximize some fitness function would they by definition be maximization algorithms?
Pretty much yes, although I'm not sure a "maximization algorithm" is a well-used term, and only if a genetic algorithm is defined as such, which I don't believe it is strictly.
Generic algorithms can also try to minimize the distance to some goal function value, or minimize the function value, but then again, this can just be rephrased as maximization without loss of generality.
Perhaps more significantly, there isn't a strict need to even have a function - the candidates just need to be comparable. If they have a total order, it's again possible to rephrase it as a maximization problem. If they don't have a total order, it might be a bit more difficult to get candidates objectively better than all the others, although nothing's stopping you from running the GA on this type of data.
In conclusion - trying to maximize a function is the norm (and possibly in line with how you'll mostly see it defined), but don't be surprised if you come across a GA that doesn't do this.
Are all genetic algorithms maximization algorithms?
No they aren't.
Genetic algorithms are popular approaches to multi-objective optimization (e.g. NSGA-II or SPEA-2 are very well known genetic algorithm based approaches).
For multi-objective optimization you aren't trying to maximize a function.
This because scalarizing multi-objective optimization problems is seldom a viable way (i.e. there isn't a single solution that simultaneously optimizes each objective) and what you are looking for is a set of nondominated solutions (or a representative subset of the Pareto optimal solutions).
There are also approaches to evolutionary algorithms which try to capture open-endedness of natural evolution searching for behavioral novelty. Even in an objective-based problem, such novelty search ignores the objective (see Abandoning Objectives: Evolution through the
Search for Novelty Alone by Joel Lehman and Kenneth O. Stanley for details).

In regards to genetic algorithms

Currently, I'm studying genetic algorithms (personal, not required) and I've come across some topics I'm unfamiliar or just basically familiar with and they are:
Search Space
The "extreme" of a Function
I understand that one's search space is a collection of all possible solutions but I also wish to know how one would decide the range of their search space. Furthermore I would like to know what an extreme is in relation to functions and how it is calculated.
I know I should probably understand what these are but so far I've only taken Algebra 2 and Geometry but I have ventured into physics, matrix/vector math, and data structures on my own so please excuse me if I seem naive.
Generally, all algorithms which are looking for a specific item in a collection of items are called search algorithms. When the collection of items is defined by a mathematical function (opposed to existing in a database), it is called a search space.
One of the most famous problems of this kind is the travelling salesman problem, where an algorithm is sought which will, given a list of cities and their distances, find the shortest route for visiting each city only once. For this problem, the exact solution can be found only by examining all possible routes (the entire search space), and finding the shortest one (the route which has the minimum distance, which is the extreme value in the search space). The best time complexity of such an algorithm (called an exhaustive search) is exponential (although it is still possible that there may be a better solution), meaning that the worst-case running time increases exponentially as the number of cities increases.
This is where genetic algorithms come into play. Similar to other heuristic algorithms, genetic algorithms try to get close to the optimal solution by improving a candidate solution iteratively, with no guarantee that an optimal solution will actually be found.
This iterative approach has the problem that the algorithm can easily get "stuck" in a local extreme (while trying to improve a solution), not knowing that there is a potentially better solution somewhere further away:
The figure shows that, in order to get to the actual, optimal solution (the global minimum), an algorithm currently examining the solution around the local minimum needs to "jump over" a large maximum in the search space. A genetic algorithm will rapidly locate such local optimums, but it will usually fail to "sacrifice" this short-term gain to get a potentially better solution.
So, a summary would be:
exhaustive search
examines the entire search space (long time)
finds global extremes
heuristic (e.g. genetic algorithms)
examines a part of the search space (short time)
finds local extremes
Genetic algorithms are not good in tuning to a local optimum. If you want to find a global optimum at least you should be able to approach or find a strategy to approach the local optimum. Recently some improvements have been developed to better find the local optima.
"GENETIC ALGORITHM FOR INFORMATIVE BASIS FUNCTION SELECTION
FROM THE WAVELET PACKET DECOMPOSITION WITH APPLICATION TO
CORROSION IDENTIFICATION USING ACOUSTIC EMISSION"
http://gbiomed.kuleuven.be/english/research/50000666/50000669/50488669/neuro_research/neuro_research_mvanhulle/comp_pdf/Chemometrics.pdf
In general, "search space" means, what type of answers are you looking for. For example, if you are writing a genetic algorithm which builds bridges, tests them out, and then builds more, the answers you are looking for are bridge models (in some form). As another example, if you're trying to find a function which agrees with a set of sample inputs on some number of points, you might try to find a polynomial which has this property. In this instance your search space might be polynomials. You might make this simpler by putting a bound on the number of terms, maximum degree of the polynomial, etc... So you could specify that you wanted to search for polynomials with integer exponents in the range [-4, 4]. In genetic algorithms, the search space is the set of possible solutions you could generate. In genetic algorithms you need to carefully limit your search space so you avoid answers which are completely dumb. At my former university, a physics student wrote a program which was a GA to calculate the best configuration of atoms in a molecule to have low energy properties: they found a great solution having almost no energy. Unfortunately, their solution put all the atoms at the exact center of the molecule, which is physically impossible :-). GAs really hone in on good solutions to your fitness functions, so it's important to choose your search space so that it doesn't produce solutions with good fitness but are in reality "impossible answers."
As for the "extreme" of a function. This is simply the point at which the function takes its maximum value. With respect to genetic algorithms, you want the best solution to the problem you're trying to solve. If you're building a bridge, you're looking for the best bridge. In this scenario, you have a fitness function that can tell you "this bridge can take 80 pounds of weight" and "that bridge can take 120 pounds of weight" then you look around for solutions which have higher fitness values than others. Some functions have simple extremes: you can find the extreme of a polynomial using simple high school calculus. Other functions don't have a simple way to calculate their extremes. Notably, highly nonlinear functions have extremes which might be difficult to find. Genetic algorithms excel at finding these solutions using a clever search technique which looks around for high points and then finds others. It's worth noting that there are other algorithms that do this as well, hill climbers in particular. The things that make GAs different is that if you find a local maximum, other types of algorithms can get "stuck," blinded by a locally good solution, so that they never see a possibly much better solution farther away in the search space. There are other ways to adapt hill climbers to this as well, simulated annealing, for one.
The range space usually requires some intuitive understanding of the problem you're trying to solve-- some expertise in the domain of the problem. There's really no guaranteed method to pick the range.
The extremes are just the minimum and maximum values of the function.
So for instance, if you're coding up a GA just for practice, to find the minimum of, say, f(x) = x^2, you know pretty well that your range should be +/- something because you already know that you're going to find the answer at x=0. But then of course, you wouldn't use a GA for that because you already have the answer, and even if you didn't, you could use calculus to find it.
One of the tricks in genetic algorithms is to take some real-world problem (often an engineering or scientific problem) and translate it, so to speak, into some mathematical function that can be minimized or maximized. But if you're doing that, you probably already have some basic notion where the solutions might lie, so it's not as hopeless as it sounds.
The term "search space" does not restrict to genetic algorithms. I actually just means the set of solutions to your optimization problem. An "extremum" is one solution that minimizes or maximizes the target function with respect to the search space.
Search space simply put is the space of all possible solutions. If you're looking for a shortest tour, the search space consists of all possible tours that can be formed. However, beware that it's not the space of all feasible solutions! It only depends on your encoding. If your encoding is e.g. a permutation, than the search space is that of the permutation which is n! (factorial) in size. If you're looking to minimize a certain function the search space with real valued input the search space is bounded by the hypercube of the real valued inputs. It's basically infinite, but of course limited by the precision of the computer.
If you're interested in genetic algorithms, maybe you're interested in experimenting with our software. We're using it to teach heuristic optimization in classes. It's GUI driven and windows based so you can start right away. We have included a number of problems such as real-valued test functions, traveling salesman, vehicle routing, etc. This allows you to e.g. look at how the best solution of a certain TSP is improving over the generations. It also exposes the problem of parameterization of metaheuristics and lets you find better parameters that will solve the problems more effectively. You can get it at http://dev.heuristiclab.com.

Complexity of algorithms of different programming paradigms

I know that most programming languages are Turing complete, but I wonder whether a problem can be resolved with an algorithm of the same complexity with any programming language (and in particular with any programming paradigm).
To make my answer more explicit with an example: is there any problem which can be resolved with an imperative algorithm of complexity x (say O(n)), but cannot be resolved by a functional algorithm with the same complexity (or vice versa)?
Edit: The algorithm itself can be different. The question is about the complexity of solving the problem -- using any approach available in the language.
In general, no, not all algorithms can be implemented with the same order of complexity in all languages. This can be trivially proven, for instance, with a hypothetical language that disallows O(1) access to an array. However, there aren't any algorithms (to my knowledge) that cannot be implemented with the optimal order of complexity in a functional language. The complexity analysis of an algorithm's pseudocode makes certain assumptions about what operations are legal, and what operations are O(1). If you break one of those assumptions, you can alter the complexity of the algorithm's implementation even though the language is Turing complete. Turing-completeness makes no guarantees regarding the complexity of any operation.
An algorithm has a measured runtime such as O(n) like you said, implementations of an algorithm must adhere to that same runtime or they do not implement the algorithm. The language or implementation does not by definition change the algorithm and thus does not change the asymptotic runtime.
That said certain languages and technologies might make expressing the algorithm easier and offer constant speedups (or slowdowns) due to how the language gets compiled or executed.
I think your first paragraph is wrong. And I think your edit doesn't change that.
Assuming you are requiring that the observed behaviour of an implementation conforms to the time complexity of the algorithm, then...
When calculating the complexity of an algorithm assumptions are made about what operations are constant time. These assumptions are where you're going to find your clues.
Some of the more common assumptions are things like constant time array access, function calls, and arithmetic operations.
If you cannot provide those operations in a language in constant time you cannot reproduce the algorithm in a way that preserves the time complexity.
Reasonable languages can break those assumptions, and sometimes have to if they want to deal with, say, immutable data structures with shared state, concurrency, etc.
For example, Clojure uses trees to represent Vectors. This means that access is not constant time (I think it's log32 of the size of the array, but that's not constant even though it might as well be).
You can easily imagine a language having to do complicated stuff at runtime when calling a function. For example, deciding which one was meant.
Once upon a time floating point and multi-word integer multiplication and division were sadly not constant time (they were implemented in software). There was a period during which languages transitioned to using hardware when very reasonable language implementations behaved very differently.
I'm also pretty sure you can come up with algorithms that fare very poorly in the world of immutable data structures. I've seen some optimisation algorithms that would be horribly difficult, maybe impossible or effectively so, to implement while dealing immutability without breaking the time complexity.
For what it's worth, there are algorithms out there that assume set union and intersection are constant time... good luck implementing those algorithms in constant time. There are also algorithms that use an 'oracle' that can answer questions in constant time... good luck with those too.
I think that a language can have different basilar operations that cost O(1), for example mathematical operations (+, -, *, /), or variable/array access (a[i]), function call and everything you can think.
If a language do not have one of this O(1) operations (as brain bending that do not have O(1) array access) it can not do everything C can do with same complexity, but if a language have more O(1) operations (for example a language with O(1) array search) it can do more than C.
I think that all "serious" language have the same basilar O(1) operations, so they can resolve problem with same complexity.
If you consider Brainfuck or the Turing machine itself, there is one fundamental operation, that takes O(n) time there, although in most other languages it can be done in O(1) – indexing an array.
I'm not completely sure about this, but I think you can't have true array in functional programming either (having O(1) “get element at position” and O(1) “set element at position”). Because of immutability, you can either have a structure that can change quickly, but accessing it takes time or you will have to copy large parts of the structure on every change to get fast access. But I guess you could cheat around that using monads.
Looking at things like functional versus imperative, I doubt you'll find any real differences.
Looking at individual languages and implementations is a different story though. Ignoring, for the moment, the examples from Brainfuck and such, there are still some pretty decent examples to find.
I still remember one example from many years ago, writing APL (on a mainframe). The task was to find (and eliminate) duplicates in a sorted array of numbers. At the time, most of the programming I'd done was in Fortran, with a few bits and pieces in Pascal (still the latest and greatest thing at the time) or BASIC. I did what seemed obvious: wrote a loop that stepped through the array, comparing array[i] to array[i+1], and keeping track of a couple of indexes, copying each unique element back the appropriate number of places, depending on how many elements had already been eliminated.
While this would have worked quite well in the languages to which I was accustomed, it was barely short of a disaster in APL. The solution that worked a lot better was based more on what was easy in APL than computational complexity. Specifically, what you did was compare the first element of the array with the first element of the array after it had been "rotated" by one element. Then, you either kept the array as it was, or eliminated the last element. Repeat that until you'd gone through the whole array (as I recall, detected when the first element was smaller than the first element in the rotated array).
The difference was fairly simple: like most APL implementations (at least at the time), this one was a pure interpreter. A single operation (even one that was pretty complex) was generally pretty fast, but interpreting the input file took quite a bit of time. The improved version was much shorter and faster to interpret (e.g., APL provides the "rotate the array" thing as a single, primitive operation so that was only a character or two to interpret instead of a loop).

Resources