Recurrences in NEAT/HyperNEAT algorithm and intermediate results - algorithm

I am currently implementing a HyperNEAT-like algorithm in C language, but I am facing two crucial aspects of the algorithm that I am not able to implement properly. I have been delving into original source code for NEAT and HyperNEAT algorithms with no success. These issues are referred related to NEAT/CPPN recurrences due to inner feedback loops.
First issue
What is the proper computation sequence in NEAT/CPPNs with feedback loops? I provide an example of recurrence in the topology in next figure:
Feedback loops in topology
At firsts computation, feedback links do not hold any result from former computations. Should I perform the first computation with empty links?
Second issue
Imagine I want to produce an image by passing pixels coordinates to NEAT as inputs. As far as I know, the NEAT model should receive one input sample per pixel. Should I keep the intermediate results of the topology from former pixels?
If the NEAT is feedforward this issue has no effect, but if it presents feedback loops the results change. (The same issue applies for CPPN in HyperNEAT when indirect encoding the substrates).
I am aware of these questions are also related with graph theory, but I want to know how they are performed in NEAT algorithms.
Thanks!

Related

Is there a canonical/performant way to reduce arrays/matrices by removing the border values?

A motivating issue, implemented in Matlab:
N = 1000;
R = zeros(2*N);
for i=0:N-1
R = R(2:end-1, 2:end-1);
end
For this code timeit() gives a time 2.9793 on my machine. It isn't really great.
By canonical way I mean a discussion that isn't just acceptable, but a performant implementation that respects very large matrices reduced. I would be very appreciative of any answer, referrals to other discussions or literature.
As for language, I am not really a programmer, this question is motivated by a mathematics inquiry and I have encountered performance issues implementing any such reduction process in Matlab. Is there a solution to this in Matlab, or must one delve into the scary depths of C/C++?
One note: One may ask, why not just keep the matrix as is and consider parts of it as needed? To clarify, the reduction process in practice of course depends on the actual (nonzero) values of the elements, e.g. by processing the matrix in 2x2 blocks, and the removal of edge-values is needed to prepare the matrix for then next reduction step.
R(2:end-1, 2:end-1) is the correct way of extracting the part of the array that is all values except the ones at the edges. This requires copying the data, so will take some time. There is no legal way around the copy, and no alternative for extracting a part of the array. (subsref might seems like an alternative, but is the function that is internally called for the given syntax.)
As for illegal ways, you could try James Tursa’s sharedchild from the MATLAB FileExchange. It allows to create an array that references subsets of the data of another array. James is well known in the MATLAB user community as one of the people reverse-engineering the system and bending it to his will. This is solid code. But every version of MATLAB introduces new changes to the infrastructure, so upgrading MATLAB might break your program if you use this code.
You don't need the for loop. If you want to remove L elements from the borders, simply do:
R=R(L+1:end-L, L+1:end-L)
I am surprised you didn't get an error with that code. I think you should end up with an empty matrix at the end of the loop.

Multi-channel Lattice Recursive Least Squares

I'm trying to implement multi-channelt lattice RLS, i.e. the recursive least squares algorithm which performs noise cancellation with multiple inputs, but a single 'desired output'.
I have the basic RLS algorithm working with multiple components, but it's too inefficient and memory intensive for my purpose.
Wikipedia has an excellent example of lattice RLS, which works great.
https://en.wikipedia.org/wiki/Recursive_least_squares_filter
However, the sources it cites do not go into much detail on how to extend this to the multi-channel case, and re-doing the full derivation is a bit beyond me.
Does anyone know a good source which describes or implements this algorithm in the multi-channel case? Many thanks.
Use separate parallel adaptive filters...one for each noise reference and combine these outputs to subtract from your noisy signal. LMS usually works best but RLS is fine. Problems arise if any of the noise references are heavily correlated with the desired signal.

Sample input for various algorithms

I have been reading parts of Introduction to Algorithms by Cormen et al, and have implemented some of the algorithms.
In order to test my implementations I wrote some glue code to do file io, then made some sample input by hand and some more sample input by writing programs that generate sample input.
However I am doubtful as to the quality of my own sample inputs -- corner cases; I may have missed the more interesting possibilities; I may have miscalculated the proper output; etc.
Is there a set of test inputs and outputs for various algorithms collected somewhere on the Internet so that I might be able to test my code? I am looking for test data reasonably specific to particular algorithms, rather than contest problems that often involve a problem solving component as well.
I understand that I might have to adjust my code depending on the format the input is collected in (e.g. The various constraints of the inputs; for graph algorithms, the representation of the graph; etc.) although, I am hoping that the change I would have to make would be reasonably trivial.
Edit:
Some particular datasets I am currently looking for are:
Lists of numbers
Skewed so that Quick sort performs badly.
Skewed so that Fibonacci Heap performs particularly well or poorly for specific operations.
Graphs (for which High Performance Mark has offered a number of interesting references)
Sparse graphs (with specific bounds on number of edges),
Dense graphs,
Since, I am still working through the book, if you are in a similar situation as I am, or you just feel the list could be improved, please feel free to edit the list -- some time soon, I may come to need datasets similar to what you are looking for. I am not entirely sure how editing privileges work, but if I have any say over it, I will try to approve it.
I don't know of any one resource which will provide you with sample inputs for all the types of algorithm that Cormen et al cover but for graph datasets here are a couple of references:
Knuth's Stanford Graphbase
and
the Stanford Large Network Dataset Collection
which I stumbled across while looking for the link to the former. You might be interested in this one too:
the Matrix Market
Why not edit your question and let SO know what other types of input you are looking for.
I am going to stick my head on the line and say that I do not know of any such source, and I very much doubt that such a source exists.
As you seem to be aware, algorithms can be applied to almost any sort of data, and so it would be fruitless to attempt to provide sample data.

Multiple parameter optimization with lots of local minima

I'm looking for algorithms to find a "best" set of parameter values. The function in question has a lot of local minima and changes very quickly. To make matters even worse, testing a set of parameters is very slow - on the order of 1 minute - and I can't compute the gradient directly.
Are there any well-known algorithms for this kind of optimization?
I've had moderate success with just trying random values. I'm wondering if I can improve the performance by making the random parameter chooser have a lower chance of picking parameters close to ones that had produced bad results in the past. Is there a name for this approach so that I can search for specific advice?
More info:
Parameters are continuous
There are on the order of 5-10 parameters. Certainly not more than 10.
How many parameters are there -- eg, how many dimensions in the search space? Are they continuous or discrete - eg, real numbers, or integers, or just a few possible values?
Approaches that I've seen used for these kind of problems have a similar overall structure - take a large number of sample points, and adjust them all towards regions that have "good" answers somehow. Since you have a lot of points, their relative differences serve as a makeshift gradient.
Simulated
Annealing: The classic approach. Take a bunch of points, probabalistically move some to a neighbouring point chosen at at random depending on how much better it is.
Particle
Swarm Optimization: Take a "swarm" of particles with velocities in the search space, probabalistically randomly move a particle; if it's an improvement, let the whole swarm know.
Genetic Algorithms: This is a little different. Rather than using the neighbours information like above, you take the best results each time and "cross-breed" them hoping to get the best characteristics of each.
The wikipedia links have pseudocode for the first two; GA methods have so much variety that it's hard to list just one algorithm, but you can follow links from there. Note that there are implementations for all of the above out there that you can use or take as a starting point.
Note that all of these -- and really any approach to this large-dimensional search algorithm - are heuristics, which mean they have parameters which have to be tuned to your particular problem. Which can be tedious.
By the way, the fact that the function evaluation is so expensive can be made to work for you a bit; since all the above methods involve lots of independant function evaluations, that piece of the algorithm can be trivially parallelized with OpenMP or something similar to make use of as many cores as you have on your machine.
Your situation seems to be similar to that of the poster of Software to Tune/Calibrate Properties for Heuristic Algorithms, and I would give you the same advice I gave there: consider a Metropolis-Hastings like approach with multiple walkers and a simulated annealing of the step sizes.
The difficulty in using a Monte Carlo methods in your case is the expensive evaluation of each candidate. How expensive, compared to the time you have at hand? If you need a good answer in a few minutes this isn't going to be fast enough. If you can leave it running over night, it'll work reasonably well.
Given a complicated search space, I'd recommend a random initial distributed. You final answer may simply be the best individual result recorded during the whole run, or the mean position of the walker with the best result.
Don't be put off that I was discussing maximizing there and you want to minimize: the figure of merit can be negated or inverted.
I've tried Simulated Annealing and Particle Swarm Optimization. (As a reminder, I couldn't use gradient descent because the gradient cannot be computed).
I've also tried an algorithm that does the following:
Pick a random point and a random direction
Evaluate the function
Keep moving along the random direction for as long as the result keeps improving, speeding up on every successful iteration.
When the result stops improving, step back and instead attempt to move into an orthogonal direction by the same distance.
This "orthogonal direction" was generated by creating a random orthogonal matrix (adapted this code) with the necessary number of dimensions.
If moving in the orthogonal direction improved the result, the algorithm just continued with that direction. If none of the directions improved the result, the jump distance was halved and a new set of orthogonal directions would be attempted. Eventually the algorithm concluded it must be in a local minimum, remembered it and restarted the whole lot at a new random point.
This approach performed considerably better than Simulated Annealing and Particle Swarm: it required fewer evaluations of the (very slow) function to achieve a result of the same quality.
Of course my implementations of S.A. and P.S.O. could well be flawed - these are tricky algorithms with a lot of room for tweaking parameters. But I just thought I'd mention what ended up working best for me.
I can't really help you with finding an algorithm for your specific problem.
However in regards to the random choosing of parameters I think what you are looking for are genetic algorithms. Genetic algorithms are generally based on choosing some random input, selecting those, which are the best fit (so far) for the problem, and randomly mutating/combining them to generate a next generation for which again the best are selected.
If the function is more or less continous (that is small mutations of good inputs generally won't generate bad inputs (small being a somewhat generic)), this would work reasonably well for your problem.
There is no generalized way to answer your question. There are lots of books/papers on the subject matter, but you'll have to choose your path according to your needs, which are not clearly spoken here.
Some things to know, however - 1min/test is way too much for any algorithm to handle. I guess that in your case, you must really do one of the following:
get 100 computers to cut your parameter testing time to some reasonable time
really try to work out your parameters by hand and mind. There must be some redundancy and at least some sanity check so you can test your case in <1min
for possible result sets, try to figure out some 'operations' that modify it slightly instead of just randomizing it. For example, in TSP some basic operator is lambda, that swaps two nodes and thus creates new route. Your can be shifting some number up/down for some value.
then, find yourself some nice algorithm, your starting point can be somewhere here. The book is invaluable resource for anyone who starts with problem-solving.

What are canonical examples of parallel computation?

I am writing a paper to test a new application that will demonstrate the benefits of parallelized computation (compared to the traditional serialized version of this application). I want to use the canonical examples for parallel computation in my paper.
My first example is the parallel computation of pi. I would ideally like an example where each iteration is very time consuming (because of the additional overhead associated with parallelizing); my first thought is a Bayesian simulation with MCMC and Gibbs sampling.
What other problems are typically discussed in this context? What are good examples of large embarassingly parallel problems?
just a few more -
Multiplying matrices
Inverting matrices
FFT
String matching
Rendering 3d scenes (via scan line conversion or ray tracing)
One example I've used in the past of an embarrassingly parallel problem is visualizing the mandelbrot set. Each pixel can be computed independently.
Conway's Life is interesting as well, in that each value of the "next" board can be computed independently, but will depend on the relevant bits of the "current" board being done already.
I would suggest that canonical examples of parallel computation and embarassingly parallel problems are, if not completely, then nearly, disjoint sets. To put it another way, people working in parallel computation aren't terribly excited about embarassingly parallel problems; we call them that because we'd be embarassed to be working on them.
I'd be looking, if I were you, at these (a not entirely original list):
linear algebra on large dense matrices, both direct and iterative approaches;
linear algebra on huge sparse matrices
branch and bound approaches to linear programming (and related) problems;
sequence matching for bioinformatics (outside my field, I may have mis-expressed this);
continuos optimisation.
I expect there are many more.
EDIT: You may be interested in this list of problems which have been selected for benchmarking the next generation of European (academic) supercomputers. It will give you some idea of where that niche is heading.
Molecular dynamics simluations allow you to change the size of the problem until your computer resources are exhausted (i.e. 256 particles vs. 256,000,000 particles). Its truly a "canonical" example if you run the MD simulations under NVT conditions ;-)
My favorite example is monte carlo simulation.
Word counting seems to be the canonical example for MapReduce.
http://en.wikipedia.org/wiki/MapReduce#Example
Finding collisions in hash functions using Paul C. van Oorschot and Michael J. Weiner's method (PDF) comes up often in various cryptographic settings.
I used the Mandelbrot set demo to explain to my mom what parallel programming is about : http://www.ateji.com/px/demo.html
All the examples you mentions are mostly heavy data-parallel codes. You'll probably want to mention also task-oriented codes, such as servers responding to many requests in parallel, and data-flow or stream programming examples (MapReduce is a good representative of this class).

Resources