Distributed/memory-bound cloudpoint registration algorithm - algorithm

From my (brief) overview of pointcloud registration algorithms (PRA), it seems they all require that at least the one pointcloud (if not both) to be fully loaded into memory. Is there an algorithm without this requirement, i.e. working with subsets of the two pointclouds at any given time so that it can scale to arbitrarily large pointclouds and (ideally) be parallelizable so it can be computed in a distributed cluster?
Possible answers, from best to worst:
There is/are such algorithm(s) with an open source implementation available (bonus if it's in Python / has Python bindings).
There is/are such algorithm(s) presented in research literature, but without open source implementations currently available.
There isn't such an algorithm off-the-shelf, but there are (precise or heuristic) ways to break up the problem into subproblems which can be solved by any existing PRA and combine the partial solutions to form the full solution.
There cannot be such an algorithm in the general case, at least not without some extra conditions/assumptions about the pointclouds to be registered.
There cannot be such an algorithm at all.
Note: I am not looking for approaches that reduce the size of the original input(s), e.g. with sampling, point aggregation, feature extraction etc; I am only interested in algorithms that work with the full original pointclouds.

Related

Tiling Algorithm

I'm faced with a problem where I have to solve puzzles.
E.g. I have an (variable) area of 20x20 (meters for example). There are a number of given set pieces having variable sizes. Such as 4x3, 4x2, 1x5 pieces etc. These pieces can also be turned to add more pain to my problem. The point of the puzzle is to fill the entire area of 20x20 with the given pieces.
What would be a good starting algorithm to achieve such a feat?
I'm thinking of using a heuristic that calculates the open space (for efficiency purposes).
Thanks in advance
That's an Exact Cover problem, with a nice structure too, usually, depending on the pieces. I don't know about any heuristic algorithms, but there are several exact options that should work well.
As usual with Exact Covers, you can use Dancing Links, a way to implement Algorithm X efficiently.
Less generally, you can probably solve this with zero-suppressed decision diagrams. It depends on the tiles though. As a bonus, you can represent all possible solutions and count them or generate one with some properties, all without ever explicitly storing the entire (usually far too large) set of solutions.
BDDs would work about as well, using more nodes to accomplish the same thing (because the solutions are very sparse, as in, using few of the possible tile-placements - ZDDs like that but BDDs like symmetry better than sparseness).
Or you could turn it into a SAT problem, then you get less information (no solution count for example), but faster if there are easy solutions.

Floating point algorithms with potential for performance optimization

For a university lecture I am looking for floating point algorithms with known asymptotic runtime, but potential for low-level (micro-)optimization. This means optimizations such as minimizing cache misses and register spillages, maximizing instruction level parallelism and taking advantage of SIMD (vector) instructions on new CPUs. The optimizations are going to be CPU-specific and will make use of applicable instruction set extensions.
The classic textbook example for this is matrix multiplication, where great speedups can be achieved by simply reordering the sequence of memory accesses (among other tricks). Another example is FFT. Unfortunately, I am not allowed to choose either of these.
Anyone have any ideas, or an algorithm/method that could use a boost?
I am only interested in algorithms where a per-thread speedup is conceivable. Parallelizing problems by multi-threading them is fine, but not the scope of this lecture.
Edit 1: I am taking the course, not teaching it. In the past years, there were quite a few projects that succeeded in surpassing the current best implementations in terms of performance.
Edit 2: This paper lists (from page 11 onwards) seven classes of important numerical methods and some associated algorithms that use them. At least some of the mentioned algorithms are candidates, it is however difficult to see which.
Edit 3: Thank you everyone for your great suggestions! We proposed to implement the exposure fusion algorithm (paper from 2007) and our proposal was accepted. The algorithm creates HDR-like images and consists mainly of small kernel convolutions followed by weighted multiresolution blending (on the Laplacian pyramid) of the source images. Interesting for us is the fact that the algorithm is already implemented in the widely used Enfuse tool, which is now at version 4.1. So we will be able to validate and compare our results with the original and also potentially contribute to the development of the tool itself. I will update this post in the future with the results if I can.
The simplest possible example:
accumulation of a sum. unrolling using multiple accumulators and vectorization allow a speedup of (ADD latency)*(SIMD vector width) on typical pipelined architectures (if the data is in cache; because there's no data reuse, it typically won't help if you're reading from memory), which can easily be an order of magnitude. Cute thing to note: this also decreases the average error of the result! The same techniques apply to any similar reduction operation.
A few classics from image/signal processing:
convolution with small kernels (especially small 2d convolves like a 3x3 or 5x5 kernel). In some sense this is cheating, because convolution is matrix multiplication, and is intimately related to the FFT, but in reality the nitty-gritty algorithmic techniques of high-performance small kernel convolutions are quite different from either.
erode and dilate.
what image people call a "gamma correction"; this is really evaluation of an exponential function (maybe with a piecewise linear segment near zero). Here you can take advantage of the fact that image data is often entirely in a nice bounded range like [0,1] and sub-ulp accuracy is rarely needed to use much cheaper function approximations (low-order piecewise minimax polynomials are common).
Stephen Canon's image processing examples would each make for instructive projects. Taking a different tack, though, you might look at certain amenable geometry problems:
Closest pair of points in moderately high dimension---say 50000 or so points in 16 or so dimensions. This may have too much in common with matrix multiplication for your purposes. (Take the dimension too much higher and dimensionality reduction silliness starts mattering; much lower and spatial data structures dominate. Brute force, or something simple using a brute-force kernel, is what I would want to use for this.)
Variation: For each point, find the closest neighbour.
Variation: Red points and blue points; find the closest red point to each blue point.
Welzl's smallest containing circle algorithm is fairly straightforward to implement, and the really costly step (check for points outside the current circle) is amenable to vectorisation. (I suspect you can kill it in two dimensions with just a little effort.)
Be warned that computational geometry stuff is usually more annoying to implement than it looks at first; don't just grab a random paper without understanding what degenerate cases exist and how careful your programming needs to be.
Have a look at other linear algebra problems, too. They're also hugely important. Dense Cholesky factorisation is a natural thing to look at here (much more so than LU factorisation) since you don't need to mess around with pivoting to make it work.
There is a free benchmark called c-ray.
It is a small ray-tracer for spheres designed to be a benchmark for floating-point performance.
A few random stackshots show that it spends nearly all its time in a function called ray_sphere that determines if a ray intersects a sphere and if so, where.
They also show some opportunities for larger speedup, such as:
It does a linear search through all the spheres in the scene to try to find the nearest intersection. That represents a possible area for speedup, by doing a quick test to see if a sphere is farther away than the best seen so far, before doing all the 3-d geometry math.
It does not try to exploit similarity from one pixel to the next. This could gain a huge speedup.
So if all you want to look at is chip-level performance, it could be a decent example.
However, it also shows how there can be much bigger opportunities.

Any good nearest-neighbors algorithm for similar images?

I am looking for an algorithm that can search for similar images in a large collection.
I'm currently using a SURF implementation in OpenCL.
At first I used the KNN search algorithm to compare every image's interrest points to the rest of the collection but tests revealed that it doesn't scale well. I've also tried a Hadoop implementation of KNN-Join which really takes a lot of temporary space in HDFS, way too much compared to the amount of input data. In fact pairwise distance approach isn't really appropriate because of the dimension of my input vectors (64).
I heard of Locally Sensitive Hashing and wondered if there was any free implementation, or if it's worth implementing it, maybe there's another algorithm I am not aware of ?
IIRC the flann algorithm is a good compromise:
http://people.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN

How to sample a scale-free graph

Given a large scale-free graph (a social network graph), what's the best way to sample it such that the sample retains an acceptable abstraction of the properties of the original?
I have a large graph (Munmun's twitter dataset, if you know it). But I need a connected sample of that graph with a reasonably large diameter (tl;dr... reasons why on request... a diameter of 10 would be good).
The problem is that any kinda breadth-first search always is likely to come across some massively connected nodes. So I start such a search, getting the friends of all nodes which I come across. I inevitably come across some massively-connected nodes, and have to get all their friends. This is a problem because I end up with a large number of nodes which are close to each other in the graph. To make programmatic analysis feasible, I have to limit the number of nodes (and edges). The whole point of this exercise is to find shortest paths between nodes, so I'm generally interested in ALL of a node's neighbours. And that's the problem.
One hack around this is to limit the max. number of nodes connected to a user which I'm interested in. For instance, if I come across #barackobama in my breadth-first search, I ensure that I only accept some small proportion of his friends and ignore the rest. But would this hacked graph be worth a damn, or am I losing too much information in terms of finding shortest paths??
Hope that makes sense...
Several sampling methods exist, how to choose one depends (amongst other things) of the properties you want to preserve. I found the literature review (section 3) in the thesis Sampling and Inference in Complex Networks [Maiya '11] very informative, for that matter.
But you seem to have found a way of sampling your network, and now you want to find out if the sample is representative of the whole graph in terms of shortest paths. You can try to have a look at this paper: Complex Network Measurements: Estimating the Relevance of Observed Properties [Latapy & Magnien '08]. They describe a method to assess the representativeness of a sample, regarding various classic topological properties. To summarize their approach, they initially have access to the whole studied network, and simulate some sampling process on these data, with increasing sample size. They monitor how properties evolve depending on sample size, and decide of an appropriate size when the properties of interest are stable enough. Their tool is freely available online.
Edit: the only ready-to-use tool I could find online is Albatross. The associated article Albatross Sampling: Robust and Effective Hybrid Vertex Sampling for Social Graphs [Jin et al. '11] also contains a nice review of existing sampling methods, some of which are implemented in the source code they provide.
Edit 2: I needed to use Albatross on a Linux system, so I did a Java port. It's very raw, but it seems to work fine. It's available on GitHub: https://github.com/vlabatut/Albatross
I am not sure, if I understand your question correctly. I think the main question you have is, about how you can compute the shortest path of two nodes in a giant, directed graph. Creating a subsample of the graph seems to be your attempt to create an efficient solution. (But I probably misunderstood you completely.)
Perhaps this SO-Question has some pointers for you: Efficiently finding the shortest path in large graphs
The graphs in that question seem to be significantly smaller, though.
You might want to check the following: Gscaler: https://github.com/jayCool/Gscaler
This is a recent tool which produces synthetic scaled graphs.
It contains the jar file and the related paper for your reference.

Multiple parameter optimization with lots of local minima

I'm looking for algorithms to find a "best" set of parameter values. The function in question has a lot of local minima and changes very quickly. To make matters even worse, testing a set of parameters is very slow - on the order of 1 minute - and I can't compute the gradient directly.
Are there any well-known algorithms for this kind of optimization?
I've had moderate success with just trying random values. I'm wondering if I can improve the performance by making the random parameter chooser have a lower chance of picking parameters close to ones that had produced bad results in the past. Is there a name for this approach so that I can search for specific advice?
More info:
Parameters are continuous
There are on the order of 5-10 parameters. Certainly not more than 10.
How many parameters are there -- eg, how many dimensions in the search space? Are they continuous or discrete - eg, real numbers, or integers, or just a few possible values?
Approaches that I've seen used for these kind of problems have a similar overall structure - take a large number of sample points, and adjust them all towards regions that have "good" answers somehow. Since you have a lot of points, their relative differences serve as a makeshift gradient.
Simulated
Annealing: The classic approach. Take a bunch of points, probabalistically move some to a neighbouring point chosen at at random depending on how much better it is.
Particle
Swarm Optimization: Take a "swarm" of particles with velocities in the search space, probabalistically randomly move a particle; if it's an improvement, let the whole swarm know.
Genetic Algorithms: This is a little different. Rather than using the neighbours information like above, you take the best results each time and "cross-breed" them hoping to get the best characteristics of each.
The wikipedia links have pseudocode for the first two; GA methods have so much variety that it's hard to list just one algorithm, but you can follow links from there. Note that there are implementations for all of the above out there that you can use or take as a starting point.
Note that all of these -- and really any approach to this large-dimensional search algorithm - are heuristics, which mean they have parameters which have to be tuned to your particular problem. Which can be tedious.
By the way, the fact that the function evaluation is so expensive can be made to work for you a bit; since all the above methods involve lots of independant function evaluations, that piece of the algorithm can be trivially parallelized with OpenMP or something similar to make use of as many cores as you have on your machine.
Your situation seems to be similar to that of the poster of Software to Tune/Calibrate Properties for Heuristic Algorithms, and I would give you the same advice I gave there: consider a Metropolis-Hastings like approach with multiple walkers and a simulated annealing of the step sizes.
The difficulty in using a Monte Carlo methods in your case is the expensive evaluation of each candidate. How expensive, compared to the time you have at hand? If you need a good answer in a few minutes this isn't going to be fast enough. If you can leave it running over night, it'll work reasonably well.
Given a complicated search space, I'd recommend a random initial distributed. You final answer may simply be the best individual result recorded during the whole run, or the mean position of the walker with the best result.
Don't be put off that I was discussing maximizing there and you want to minimize: the figure of merit can be negated or inverted.
I've tried Simulated Annealing and Particle Swarm Optimization. (As a reminder, I couldn't use gradient descent because the gradient cannot be computed).
I've also tried an algorithm that does the following:
Pick a random point and a random direction
Evaluate the function
Keep moving along the random direction for as long as the result keeps improving, speeding up on every successful iteration.
When the result stops improving, step back and instead attempt to move into an orthogonal direction by the same distance.
This "orthogonal direction" was generated by creating a random orthogonal matrix (adapted this code) with the necessary number of dimensions.
If moving in the orthogonal direction improved the result, the algorithm just continued with that direction. If none of the directions improved the result, the jump distance was halved and a new set of orthogonal directions would be attempted. Eventually the algorithm concluded it must be in a local minimum, remembered it and restarted the whole lot at a new random point.
This approach performed considerably better than Simulated Annealing and Particle Swarm: it required fewer evaluations of the (very slow) function to achieve a result of the same quality.
Of course my implementations of S.A. and P.S.O. could well be flawed - these are tricky algorithms with a lot of room for tweaking parameters. But I just thought I'd mention what ended up working best for me.
I can't really help you with finding an algorithm for your specific problem.
However in regards to the random choosing of parameters I think what you are looking for are genetic algorithms. Genetic algorithms are generally based on choosing some random input, selecting those, which are the best fit (so far) for the problem, and randomly mutating/combining them to generate a next generation for which again the best are selected.
If the function is more or less continous (that is small mutations of good inputs generally won't generate bad inputs (small being a somewhat generic)), this would work reasonably well for your problem.
There is no generalized way to answer your question. There are lots of books/papers on the subject matter, but you'll have to choose your path according to your needs, which are not clearly spoken here.
Some things to know, however - 1min/test is way too much for any algorithm to handle. I guess that in your case, you must really do one of the following:
get 100 computers to cut your parameter testing time to some reasonable time
really try to work out your parameters by hand and mind. There must be some redundancy and at least some sanity check so you can test your case in <1min
for possible result sets, try to figure out some 'operations' that modify it slightly instead of just randomizing it. For example, in TSP some basic operator is lambda, that swaps two nodes and thus creates new route. Your can be shifting some number up/down for some value.
then, find yourself some nice algorithm, your starting point can be somewhere here. The book is invaluable resource for anyone who starts with problem-solving.

Resources