Understanding higher order automatic differentiation - automatic-differentiation

Having recently just finished my own basic reverse mode AD for machine learning purposes, I find myself wanting to learn about the field, but I've hit a hardness wall with higher order methods.
The basic reverse AD is beautifully simple and easy to understand, but the more advanced material is both too abstract, too technical and I have not been able to find any good explanations of it on the Internet (in fact it took me quite a bit to realize basic reverse AD even exists.)
Basically, I understand how to take the second derivatives in the context of calculus, but I do not understand how to transform a reverse AD graph to get second order derivatives.
In an algorithm like edge_pushing just what do those dashed connections mean?
I've investigated the library DiffSharp and I've noted that it uses something like forward-on-reverse differentiation for calculating the Hessian. Running, that through the debugger, I've really seen that it does in fact mix forward and reverse steps in a single run. What are the principles behind that mechanism?
DiffSharp uses the jacobian-vector product to calculate the Hessian, for each variable, which is a R^m -> R^n mapping. How is that possible to get that from the original graph? Reverse AD is a R -> R^n mapping, from where do the extra dimensions come from?
Lastly, how does nested AD work?

I wrote the paper on edge_pushing. First you start with the computational graph of the gradient. And what I mean by gradient here is the computational graph of the reverse gradient method. The edge_pushing algorithm is then simply applying the reverse gradient algorithm to the this gradient graph, which would give you a Hessian. The catch here is that it does this in an intelligent way. In particular, the dotted edges are artificially added edges that represent a nonlinear interaction between two nodes (both nodes are inputs of a nonlinear function further up the graph). The nonlinear dotted edges make it easy to visualize where the major costs of calculating this reverse gradient on the gradient graph occur, and how to best accumulate the total derivative. Does that help?

I wrote a tutorial for AD that shows briefly how to do forward along with reverse here near the end. I also wrote an entire library for basic AD on the GPU that can be found linked at the same site.
Still not sure about edge_pushing, but I do not think it matters much for neural nets at any rate.

Related

Is there a simple(-ish) algorithm for drawing force-directed graphs?

I'm trying to write a little graph visualizer in P5js but I can't find a simple(-ish) algorithm to follow.
I've found ways to do it with D3 and I've found some dense textbook snippets (like this) but I'm looking for something in between the two.
Can someone explain the simplest algorithm for drawing a graph (force-directed or otherwise) or point me to a good resource?
Thanks for your help!
I literally just started something similar.
It's fairly easy to code, you just need to think about the 3 separate forces acting on each node, add them together and divide that by the mass of the node to get the movement of each node.
Gravity, put a simple force acting towards the centre of the canvas so the nodes dont launch themselves out of frame
Node-Node replusion, You can either use coulombs force (which describes particle-particle repulsion) or use the gravitational attraction equation and just reverse it
Connection Forces, this ones a little tricky, define a connection as 2 nodes and the distance between them. when the actual distance between them is different from the defined distance, add a force in the direction of the connection multiplied by the difference between the defined and the actual distance

Reduce number of nodes in 3D A* pathfinding using (part of a) uniform grid representation

I am calculating pathfinding inside a mesh which I have build a uniform grid around. The nodes (cells in the 3D grid) close to what I deem a "standable" surface I mark as accessible and they are used in my pathfinding. To get alot of detail (like being able to pathfind up small stair cases) the ammount of accessible cells in my grid have grown quite large, several thousand in larger buildings. (every grid cell is 0.5x0.5x0.5 m and the meshes are rooms with real world dimensions). Even though I only use a fraction of the actual cells in my grid for pathfinding the huge ammount slows the algorithm down. Other than that it works fine and finds the correct path through the mesh, using a weighted manhattan distance heuristic.
Imagine my grid looks like that and the mesh is inside it (can be more or less cubes but its always cubical), however the pathfinding will not be calculated on all the small cubes just a few marked as accessible (usually at the bottom of the grid but that can depend on how many floors the mesh has).
I am looking to reduce the search space for the pathfinding... I have looked at clustering like how HPA* does it and other clustering algorithms like Markov but they all seem to be best used with node graphs and not grids. One obvious solution would be to just increase the size of the small cubes building the grid but then I would lose alot of detail in the pathfinding and it would not be as robust. How could I cluster these small cubes? This is how a typical search space looks when I do my pathfinding (blue are accessible, green is path):
and as you see there is a lot of cubes to search through because the distance between them is quite small!
Never mind that the grid is an unoptimal solution for pathfinding for now.
Does anyone have an idea on how to reduce the ammount of cubes in the grid I have to search through and how would I access the neighbors after I reduce the space? :) Right now it only looks at the closest neighbors while expanding the search space.
A couple possibilities come to mind.
Higher-level Pathfinding
The first is that your A* search may be searching the entire problem space. For example, you live in Austin, Texas, and want to get into a particular building somewhere in Alberta, Canada. A simple A* algorithm would search a lot of Mexico and the USA before finally searching Canada for the building.
Consider creating a second layer of A* to solve this problem. You'd first find out which states to travel between to get to Canada, then which provinces to reach Alberta, then Calgary, and then the Calgary Zoo, for example. In a sense, you start with an overview, then fill it in with more detailed paths.
If you have enormous levels, such as skyrim's, you may need to add pathfinding layers between towns (multiple buildings), regions (multiple towns), and even countries (multiple regions). If you were making a GPS system, you might even need continents. If we'd become interstellar, our spaceships might contain pathfinding layers for planets, sectors, and even galaxies.
By using layers, you help to narrow down your search area significantly, especially if different areas don't use the same co-ordinate system! (It's fairly hard to estimate distance for one A* pathfinder if one of the regions needs latitude-longitude, another 3d-cartesian, and the next requires pathfinding through a time dimension.)
More efficient algorithms
Finding efficient algorithms becomes more important in 3 dimensions because there are more nodes to expand while searching. A Dijkstra search which expands x^2 nodes would search x^3, with x being the distance between the start and goal. A 4D game would require yet more efficiency in pathfinding.
One of the benefits of grid-based pathfinding is that you can exploit topographical properties like path symmetry. If two paths consist of the same movements in a different order, you don't need to find both of them. This is where a very efficient algorithm called Jump Point Search comes into play.
Here is a side-by-side comparison of A* (left) and JPS (right). Expanded/searched nodes are shown in red with walls in black:
Notice that they both find the same path, but JPS easily searched less than a tenth of what A* did.
As of now, I haven't seen an official 3-dimensional implementation, but I've helped another user generalize the algorithm to multiple dimensions.
Simplified Meshes (Graphs)
Another way to get rid of nodes during the search is to remove them before the search. For example, do you really need nodes in wide-open areas where you can trust a much more stupid AI to find its way? If you are building levels that don't change, create a script that parses them into the simplest grid which only contains important nodes.
This is actually called 'offline pathfinding'; basically finding ways to calculate paths before you need to find them. If your level will remain the same, running the script for a few minutes each time you update the level will easily cut 90% of the time you pathfind. After all, you've done most of the work before it became urgent. It's like trying to find your way around a new city compared to one you grew up in; knowing the landmarks means you don't really need a map.
Similar approaches to the 'symmetry-breaking' that Jump Point Search uses were introduced by Daniel Harabor, the creator of the algorithm. They are mentioned in one of his lectures, and allow you to preprocess the level to store only jump-points in your pathfinding mesh.
Clever Heuristics
Many academic papers state that A*'s cost function is f(x) = g(x) + h(x), which doesn't make it obvious that you may use other functions, multiply the weight of the cost functions, and even implement heatmaps of territory or recent deaths as functions. These may create sub-optimal paths, but they greatly improve the intelligence of your search. Who cares about the shortest path when your opponent has a choke point on it and has been easily dispatching anybody travelling through it? Better to be certain the AI can reach the goal safely than to let it be stupid.
For example, you may want to prevent the algorithm from letting enemies access secret areas so that they avoid revealing them to the player, and so that they AI seems to be unaware of them. All you need to achieve this is a uniform cost function for any point within those 'off-limits' regions. In a game like this, enemies would simply give up on hunting the player after the path grew too costly. Another cool option is to 'scent' regions the player has been recently (by temporarily increasing the cost of unvisited locations because many algorithms dislike negative costs).
If you know what places you won't need to search, but can't implement in your algorithm's logic, a simple increase to their cost will prevent unnecessary searching. There's a lot of ways to take advantage of heuristics to simplify and inform your pathfinding, but your biggest gains will come from Jump Point Search.
EDIT: Jump Point Search implicitly selects pathfinding direction using the same heuristics as A*, so you may be able to implement heuristics to a small degree, but their cost function won't be the cost of a node, but rather, the cost of traveling between the two nodes. (A* generally searches adjacent nodes, so the distinction between a node's cost and the cost of traveling to it tends to break down.)
Summary
Although octrees/quad-trees/b-trees can be useful in collision-detection, they aren't as applicable to searches because they section a graph based on its coordinates; not on its connections. Layering your graph (mesh in your vocabulary) into super graphs (regions) is a more effective solution.
Hopefully I've covered anything you'll find useful.
Good luck!

Algorithm to minimize vertex distances - Dwarf Fortress

I play Dwarf Fortress game. And the main challenge for me is to design layout of the fortress efficiently. Meaning, that each industry flow should be as dense as possible, to minimize the travel distances.
An example could be food industry . Each grey ellipse represents a single building. Each white rectangle represents product from the building.
My goal is to find algorithm which would distribute the buildings on 2D grid in such manner that distance between those building is minimal in the sense how they are connected. Meaning that fishery and loom can be far apart, but loom and farmer's should be as close as possible.
At the moment I have considered using some ready software, to simulate the layout, but some tips for algorithm would be fine.
Currently I'm considering some force-directed algorithm, but I'm not sure about the discrete grid requirement.
Formalization of question: Is there a Force Draw Graph algorithm which works in discrete coordinates?
UPDATE: I have found implementation of the Force draw algorithm in AS3 (the web contains JS version too). I will try to convert it to discrete version. But I have some doubts it will work...
UPDATE2: Some further restrictions were requested in comments. Here they are:
Each building occupy single cell on virtual grid. Buildings can be on adjacent cells. Buildings cannot stack/overlap.
(PS: In game, each building has deifned size, usually 3x3, but I want to keep the problem more general, to allow for more approaches).
You are pretty much trying to solve an instance of a floor-planning problem where you are trying to minimize the total "connection" length. Most of these problems are instances of NP-hard problems, some of them have pseudo-polynomial run-time algorithms.
There is a special case you might be interested that is actually fully solvable in polynomial time: if the relative positions of the "boxes" or buildings you want to place are known ahead of time.
For full details on how to solve this particular case, please refer to this tutorial on geometric programming from Stanford, chapter 6, section 6.1 the first example entitled "Floor planning." Another website also includes matlab code that implements and solves the problem (under chapter 8 Geometric Programming.)
So I've managed to do some code which aproximates solution of this problem. It's not a top class product but it's working. I plan to do some updates over time, but I don't have any time frame set.
The source code is here: https://github.com/sutr90/DF-Layout
My code uses Simulated Annealing approach. Where cost function is based on total area, total length of edges and overlap. To measure distance, I use taxi-cab metric, but that is a subject to change.

Pathfinding through four dimensional data

The problem is finding an optimal route for a plane through four dimensional winds (winds at differing heights and that change as you travel (predicative wind model)).
I've used the traditional A* search algorithm and hacked it to get it to work in 3 dimensions and with wind vectors.
It works in a lot the cases but is extremely slow (im dealing with huge amounts of data nodes) and doesnt work for some edge cases.
I feel like I've got it working "well" but its feels very hacked together.
Is there a better more efficient way for path finding through data like this (maybe a genetic algorithm or neural network), or something I havent even considered? Maybe fluid dynamics? I dont know?
Edit: further details.
Data is wind vectors (direction, magnitude).
Data is spaced 15x15km at 25 different elevation levels.
By "doesnt always work" I mean it will pick a stupid path for an aircraft because the path weight is the same as another path. Its fine for path finding but sub-optimal for a plane.
I take many things into account for each node change:
Cost of elevation over descending.
Wind resistance.
Ignoring nodes with too high of a resistance.
Cost of diagonal tavel vs straight etc.
I use euclidean distance as my heuristic or H value.
I use various factors for my weight or G value (the list above).
Thanks!
You can always have a trade off of time-optimilaity by using a weighted A*.
Weighted A* [or A* epsilon], is expected to find a path faster then A*, but the path won't be optimal [However, it gives you a bound on its optimality, as a paremeter of your epsilon/weight].
A* isn't advertised to be the fastest search algorithm; it does guarantee that the first solution it finds will be the best (assuming you provide an admissible heuristic). If yours isn't working for some cases, then something is wrong with some aspect of your implementation (maybe w/ the mechanics of A*, maybe the domain-specific stuff; given you haven't provided any details, can't say more than that).
If it is too slow, you might want to reconsider the heuristic you are using.
If you don't need an optimal solution, then some other technique might be more appropriate. Again, given how little you have provided about the problem, hard to say more than that.
Are you planning offline or online?
Typically for these problems you don't know what the winds are until you're actually flying through them. If this really is an online problem you may want to consider trying to construct a near-optimal policy. There is a quite a lot of research in this area already, one of the best is "Autonomous Control of Soaring Aircraft by Reinforcement Learning" by John Wharington.

Explain 0-extension algorithm

I'm trying to implement the 0-extension algorithm.
It is used to colour a graph with a number of colours where some nodes already have a colour assigned and where every edge has a distance. The algorithm calculates an assignment of colours so that neighbouring nodes with the same colour have as much distance between them as possible.
I found this paper explaining the algorithm: http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=1FBA2D22588CABDAA8ECF73B41BD3D72?doi=10.1.1.100.8049&rep=rep1&type=pdf
but I don't see how I need to implement it.
I already asked this question on the "theoretical computer science" site, but halfway the discussion we went beyond the site's scope:
https://cstheory.stackexchange.com/questions/6163/explain-0-extension-algorithm
Can anyone explain this algorithm in layman's terms?
I'm planning to make the final code opensource in the jgrapht package.
The objective of 0-extension is to minimize the total weighted cost of edges with different color endpoints rather than to maximize it, so 0-extension is really a clustering problem rather than a coloring problem. I'm generally skeptical that using a clustering algorithm to color would have good results. If you want something with a theoretical guarantee, you could look into approximations to the MAXCUT problem (really a generalization if there are more than two colors), but I suspect that a local-search algorithm would work better in practice.

Resources