Circular representation of a tree structure - algorithm

I have some data in a tree structure, and I want to represent them in a graphical way, with the root node in the middle of the stage, his children displaced in a circle around him, and so on for every children, around their parent.
I don't want overlapping nodes, so the question is how to arrange space in an optimal way.
Something less or more like (found via google)
What algorhythms I have to search to realize something like this?

If you don't care about how it's done, but just that you are visualizing the data, then take a look at graphviz's radial layout. Although the example doesn't look exactly what you want, it is the layout you'd need. It'll also give you some ideas on how it's done too with the loads of research papers in there. Good luck!
You could also see how easy it is to extend this paper into a circular structure.

You can do it in an emergent way by setting up a system in which each tree node tries to keep as much distance from all other nodes (except parent) as possible, but as short a distance as possible from the parent (down to some minimum distance which it must maintain). If you run that algorithm for each node repeatedly until it stabilizes, you'll have an arrangement like the one you describe. I'm sure there are a lot of optimizations you can do to it, but I'm pretty sure this is going to be the simplest approach. Trying to calculate all the layout up front would be very complex...

You are trying to draw a planar representation of a graph.
Find some buzzwords and perhaps a resource here
And in wikipedia
Ah and I forgot: You can do this the newtonian way with forces.
Simply give all nodes a repelling potential, like make them all Protons which push each other away. Give the edges the properties of newtonian springs, exerting forces that pull them together and you are all set.
Could even create nice animations that way.
This is also an official way of graph drawing, but I don't know the name.

If you want to draw the tree with a minimum of wasted space and short connections, then you're in for a computationally expensive solution. It will be hard to get this to be realtime on a tree of decent size, not to mention that making small changes to the tree might result in a radically different equilibrium.
Another approach would be to abandon the physical simulation and just build it iteratively. I've done something similar last week, but my trees are probably a lot less involved than yours.
For this tree-layout, each node object has to store an angle and an offset. These two numbers control where on the graphics surface they end up.
Here is my basic algorithm:
1) recurse over your entire tree-data and find all the Leaf nodes.
2) while you're doing this, be sure to measure the length of each branching structure, so you know which is the longest.
3) once you have all your leaf nodes, distribute them equally over a concentric circle. You can either use the entire circle, or only some part of the angle domain.
4) once all Leaf nodes have been solved, you recurse again over the tree, going from the outside in. Each node you encounter that is not a leaf node is in need of layout. Essentially, every node from here on has an angle which is the average of all it's child nodes, and the offset is the graph_radius * (depth_of_node / maximum_depth)
I found this gives me a very decent and humanly readable distribution, albeit not a very efficient one in terms of screen usage. I uploaded an animation of my tree-display here: GIF anim

Related

Storing nearest neighbor

There are algorithms out there to find k nearest neighbors in many ways. I am eventually will have to apply these, however in my case, I can code my program to add points one-by-one rather than add all the points altogether, then run a algorithm. Is this make the problem easier, so that maybe I could use a tree, and add each node to a neighborhood tree or something. This seems like it would be faster than searching all the points linearly.
And in my program points will be moving constantly, so I will be required to update neighbors, that's why I thought it is better to use a tree or another construct to update records, rather than calculate nearest neighbors in every movement of these points. Do you know of such data structure ?
Maybe a graph datastructure/database is most appropriate due to the structural similarity.
Example: https://neo4j.com/graphgist/a7c915c8-a3d6-43b9-8127-1836fecc6e2f (I do not work for neo4j)

Reduce number of nodes in 3D A* pathfinding using (part of a) uniform grid representation

I am calculating pathfinding inside a mesh which I have build a uniform grid around. The nodes (cells in the 3D grid) close to what I deem a "standable" surface I mark as accessible and they are used in my pathfinding. To get alot of detail (like being able to pathfind up small stair cases) the ammount of accessible cells in my grid have grown quite large, several thousand in larger buildings. (every grid cell is 0.5x0.5x0.5 m and the meshes are rooms with real world dimensions). Even though I only use a fraction of the actual cells in my grid for pathfinding the huge ammount slows the algorithm down. Other than that it works fine and finds the correct path through the mesh, using a weighted manhattan distance heuristic.
Imagine my grid looks like that and the mesh is inside it (can be more or less cubes but its always cubical), however the pathfinding will not be calculated on all the small cubes just a few marked as accessible (usually at the bottom of the grid but that can depend on how many floors the mesh has).
I am looking to reduce the search space for the pathfinding... I have looked at clustering like how HPA* does it and other clustering algorithms like Markov but they all seem to be best used with node graphs and not grids. One obvious solution would be to just increase the size of the small cubes building the grid but then I would lose alot of detail in the pathfinding and it would not be as robust. How could I cluster these small cubes? This is how a typical search space looks when I do my pathfinding (blue are accessible, green is path):
and as you see there is a lot of cubes to search through because the distance between them is quite small!
Never mind that the grid is an unoptimal solution for pathfinding for now.
Does anyone have an idea on how to reduce the ammount of cubes in the grid I have to search through and how would I access the neighbors after I reduce the space? :) Right now it only looks at the closest neighbors while expanding the search space.
A couple possibilities come to mind.
Higher-level Pathfinding
The first is that your A* search may be searching the entire problem space. For example, you live in Austin, Texas, and want to get into a particular building somewhere in Alberta, Canada. A simple A* algorithm would search a lot of Mexico and the USA before finally searching Canada for the building.
Consider creating a second layer of A* to solve this problem. You'd first find out which states to travel between to get to Canada, then which provinces to reach Alberta, then Calgary, and then the Calgary Zoo, for example. In a sense, you start with an overview, then fill it in with more detailed paths.
If you have enormous levels, such as skyrim's, you may need to add pathfinding layers between towns (multiple buildings), regions (multiple towns), and even countries (multiple regions). If you were making a GPS system, you might even need continents. If we'd become interstellar, our spaceships might contain pathfinding layers for planets, sectors, and even galaxies.
By using layers, you help to narrow down your search area significantly, especially if different areas don't use the same co-ordinate system! (It's fairly hard to estimate distance for one A* pathfinder if one of the regions needs latitude-longitude, another 3d-cartesian, and the next requires pathfinding through a time dimension.)
More efficient algorithms
Finding efficient algorithms becomes more important in 3 dimensions because there are more nodes to expand while searching. A Dijkstra search which expands x^2 nodes would search x^3, with x being the distance between the start and goal. A 4D game would require yet more efficiency in pathfinding.
One of the benefits of grid-based pathfinding is that you can exploit topographical properties like path symmetry. If two paths consist of the same movements in a different order, you don't need to find both of them. This is where a very efficient algorithm called Jump Point Search comes into play.
Here is a side-by-side comparison of A* (left) and JPS (right). Expanded/searched nodes are shown in red with walls in black:
Notice that they both find the same path, but JPS easily searched less than a tenth of what A* did.
As of now, I haven't seen an official 3-dimensional implementation, but I've helped another user generalize the algorithm to multiple dimensions.
Simplified Meshes (Graphs)
Another way to get rid of nodes during the search is to remove them before the search. For example, do you really need nodes in wide-open areas where you can trust a much more stupid AI to find its way? If you are building levels that don't change, create a script that parses them into the simplest grid which only contains important nodes.
This is actually called 'offline pathfinding'; basically finding ways to calculate paths before you need to find them. If your level will remain the same, running the script for a few minutes each time you update the level will easily cut 90% of the time you pathfind. After all, you've done most of the work before it became urgent. It's like trying to find your way around a new city compared to one you grew up in; knowing the landmarks means you don't really need a map.
Similar approaches to the 'symmetry-breaking' that Jump Point Search uses were introduced by Daniel Harabor, the creator of the algorithm. They are mentioned in one of his lectures, and allow you to preprocess the level to store only jump-points in your pathfinding mesh.
Clever Heuristics
Many academic papers state that A*'s cost function is f(x) = g(x) + h(x), which doesn't make it obvious that you may use other functions, multiply the weight of the cost functions, and even implement heatmaps of territory or recent deaths as functions. These may create sub-optimal paths, but they greatly improve the intelligence of your search. Who cares about the shortest path when your opponent has a choke point on it and has been easily dispatching anybody travelling through it? Better to be certain the AI can reach the goal safely than to let it be stupid.
For example, you may want to prevent the algorithm from letting enemies access secret areas so that they avoid revealing them to the player, and so that they AI seems to be unaware of them. All you need to achieve this is a uniform cost function for any point within those 'off-limits' regions. In a game like this, enemies would simply give up on hunting the player after the path grew too costly. Another cool option is to 'scent' regions the player has been recently (by temporarily increasing the cost of unvisited locations because many algorithms dislike negative costs).
If you know what places you won't need to search, but can't implement in your algorithm's logic, a simple increase to their cost will prevent unnecessary searching. There's a lot of ways to take advantage of heuristics to simplify and inform your pathfinding, but your biggest gains will come from Jump Point Search.
EDIT: Jump Point Search implicitly selects pathfinding direction using the same heuristics as A*, so you may be able to implement heuristics to a small degree, but their cost function won't be the cost of a node, but rather, the cost of traveling between the two nodes. (A* generally searches adjacent nodes, so the distinction between a node's cost and the cost of traveling to it tends to break down.)
Summary
Although octrees/quad-trees/b-trees can be useful in collision-detection, they aren't as applicable to searches because they section a graph based on its coordinates; not on its connections. Layering your graph (mesh in your vocabulary) into super graphs (regions) is a more effective solution.
Hopefully I've covered anything you'll find useful.
Good luck!

How to find the neighbors of a graph effiiciently

I have a program that create graphs as shown below
The algorithm starts at the green color node and traverses the graph. Assume that a node (Linked list type node with 4 references Left, Right, Up and Down) has been added to the graph depicted by the red dot in the image. Inorder to integrate the newly created node with it neighbors I need to find the four objects and link it so the graph connectivity will be preserved.
Following is what I need to clarify
Assume that all yellow colored nodes are null and I do not keep a another data structure to map nodes what is the most efficient way to find the existence of the neighbors of the newly created node. I know the basic graph search algorithms like DFS, BFS etc and shortest path algorithms but I do not think any of these are efficient enough because the graph can have about 10000 nodes and doing graph search algorithms (starting from the green node) to find the neighbors when a new node is added seems computationally expensive to me.
If the graph search is not avoidable what is the best alternative structure. I thought of a large multi-dimensional array. However, this has memory wastage and also has the issue of not having negative indexes. Since the graph in the image can grow in any directions. My solution to this is to write a separate class that consists of a array based data structure to portray negative indexes. However, before taking this option I would like to know if I could still solve the problem without resolving to a new structure and save a lot of rework.
Thank you for any feedback and reading this question.
I'm not sure if I understand you correctly. Do you want to
Check that there is a path from (0,0) to (x1,y1)
or
Check if any of the neighbors of (x1,y1) are in the graph? (even if there is no path from (0,0) to any of this neighbors).
I assume that you are looking for a path (otherwise you won't use a linked-list), which implies that you can't store points which have no path to (0,0).
Also, you mentioned that you don't want to use any other data structure beside / instead of your 2D linked-list.
You can't avoid full graph search. BFS and DFS are the classic algorithms. I don't think that you care about the shortest path - any path would do.
Another approaches you may consider is A* (simple explanation here) or one of its variants (look here).
An alternative data structure would be a set of nodes (each node is a pair < x,y > of course). You can easily run 4 checks to see if any of its neighbors are already in the set. It would take O(n) space and O(logn) time for both check and add. If your programming language does not support pairs as nodes of a set, you can use a single integer (x*(Ymax+1) + Y) instead.
Your data structure can be made to work, but probably not efficiently. And it will be a lot of work.
With your current data structure you can use an A* search (see https://en.wikipedia.org/wiki/A*_search_algorithm for a basic description) to find a path to the point, which necessarily finds a neighbor. Then pretend that you've got a little guy at that point, put his right hand on the wall, then have him find his way clockwise around the point. When he gets back, he'll have found the rest.
What do I mean by find his way clockwise? For example suppose that you go Down from the neighbor to get to his point. Then your guy should be faced the first of Right, Up, and Left which he has a neighbor. If he can go Right, he will, then he will try the directions Down, Right, Up, and Left. (Just imagine trying to walk through the maze yourself with your right hand on the wall.)
This way lies insanity.
Here are two alternative data structures that are much easier to work with.
You can use a quadtree. See http://en.wikipedia.org/wiki/Quadtree for a description. With this inserting a node is logarithmic in time. Finding neighbors is also logarithmic. And you're only using space for the data you have, so even if your graph is very spread out this is memory efficient.
Alternately you can create a class for a type of array that takes both positive and negative indices. Then one that builds on that to be 2-d class that takes both positive and negative indices. Under the hood that class would be implemented as a regular array and an offset. So an array that can start at some number, positive or negative. If ever you try to insert a piece of data that is before the offset, you create a new offset that is below that piece by a fixed fraction of the length of the array, create a new array, and copy data from the old to the new. Now insert/finding neighbors are usually O(1) but it can be very wasteful of memory.
You can use a spatial index like a quad tree or a r-tree.

Placing boxes in diagram the optimal way (no crossing lines)

I am doing an assignment where I have to draw a diagram on a web page with a number of boxes, some of which are to be connected by arrows. I have everything setup so that I'm able to draw the actual diagram, arrows and all but now I'm faced with the problem of placing the boxes in the optimal way. By this I mean laying out the page so that I have a minimum of lines crossing.
I have to do two types of diagrams: One is a more hierarchical diagram where I know which box to place top left and where all boxes form a hierarchy. The other is more tricky where no box needs to have a specific place and the end result is not a hierarchy. In either scenario are there more than one connection between two boxes. It's pretty much the same as laying out an E/R diagram for a database in the most readable way.
Does anyone know how to do this or where to find information about how to do this?
Thanks in advance
./CJ
Laying out an arbitrary graph with minimal crossings is an NP-hard problem, so you're left with finding a good heuristic.
What comes to mind is this:
Lay your items on the perimeter of a circle with their connecting edges.
Use simulated annealing to swap items, aiming to minimise the number of crossings.
Tidy up using, say, force directed layout.
Another option would be to find a spanning tree, render that, then add in the back links. This may well produce more crossings than the simulated annealing approach, but it has the benefit of reusing the solution to the first part of your assignment.
Best of luck!

data structure to support google/bing maps [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I was wondering what the data structure is in an application like google/bing maps. How is it that the results are returned so quickly when searching for directions?
what kind of algorithms are being used to determine this information?
thanks
There are two parts to your question:
What kind of data structure is used to store the map information.
What kind of algorithm is used to "navigate" from source to destination.
To this, I would add another question:
How is Google/Bing able to "stream in" the data. So for example, you are able to zoom in from miles up to the ground level seamlessly, all the while maintaining the coordinate system.
I will attempt to address each question in order. Do note, that I do not work for the Google Maps or the Bing team, so quite obviously, this information might not be completely accurate. I am basing this off of the knowledge gained from a good CS course about data structures and algorithms.
Ans 1) The map is stored in an Edge Weighted Directed Graph. Locations on the map are Vertices and the path from one location to another (from one vertex to another) are the Edges.
Quite obviously, since there can be millions of vertices and an order of magnitude more edges, the really interesting thing would be the representation of this Edge Weighted Digraph.
I would say that this would be represented by some kind of Adjacency List and the reason I say so is because, if you imagine a map, it is essentially a sparse graph. There are only a few ways to get from one location to another. Think about your house! How many roads (edges in our case) lead to it? Adjacency Lists are good for representing sparse graphs, and adjacency matrix is good for representing dense graphs.
Of course, even though we are able to efficiently represent sparse graphs in memory, given the sheer number of Vertices and Edges, it would be impossible to store everything in memory at once. Hence, I would imagine some kind of a streaming library underneath.
To create an analogy for this, if you have ever played an open-world game like World of Warcraft / Syrim / GTA, you will observe that to a large part, there is no loading screen. But quite obviously, it is impossible to fit everything into memory at once. Thus using a combination of quad-trees and frustum culling algorithms, these games are able to dynamically load resources (terrain, sprites, meshes etc).
I would imagine something similar, but for Graphs. I have not put a lot of thought into this particular aspect, but to cook up a very basic system, one can imagine an in memory database, which they query and add/remove vertices and edges from the graph at run-time as needed. This brings us to another interesting point. Since vertices and edges need to be removed and added at run-time, the classic implementation of Adjacency List will not cut it.
In a classic implementation, we simply store a List (a Vector in Java) in each element of an array: Adj[]. I would imagine, a linked list in place of the Adj[] array and a binary search tree in place of List[Edge]. The binary search tree would facilitate O(log N) insertion and removal of nodes. This is extremely desirable since in the List implementation, while addition is O(1), removal is O(N) and when you are dealing with millions of edges, this is prohibitive.
A final point to note here is that until you actually start the navigation, there is "no" graph. Since there can be million of users, it doesn't make sense to maintain one giant graph for everybody (this would be impossible due to memory space requirement alone). I would imagine that as you stat the navigation process, a graph is created for you. Quite obviously, since you start from location A and go to location B (and possibly other locations after that), the graph created just for you should not take up a very large amount of memory (provided the streaming architecture is in place).
Ans 2) This is a very interesting question. The most basic algorithm for solving this problem would be Dijkstra Path Finding algorithm. Faster variations such as A* exist. I would imagine Dijkstra to be fast enough, if it could work properly with the streaming architecture discussed above. Dijkstra uses space proportional to V and time proportional to E lg V, which are very good figures, especially for sparse graphs. Do keep in mind, if the streaming architecture has not been nailed down, V and E will explode and the space and run-time requirements of Dijkstra will make it prohibitive.
Ans 1) Streaming question: Do not confuse this question with the streaming architecture discussed above. This is basically asking how the seamless zoom is achieved.
A good algorithm for achieving this is the Quad Tree algorithm (you can generalize this to n-tree). You store coarser images higher up in the tree and higher resolution images as you traverse down the tree. This is actually what KML (Keyhole) did with its mapping algorithm. Keyhole was a company that partnered with NVIDIA many years back to produce one of the first "Google Earth" like softwares.
The inspiration for Quad Tree culling, comes from modern 3D games, where it is used to quickly cull away parts of the scene which is not in the view frustum.
To further clarify this, imagine that you are looking at the map of USA from really high up. At this level, you basically split the map into 4 sections and make each section a child of the Quad Tree.
Now, as you zoom in, you zoom in on one of the sections (quite obviously you can zoom right in the center, so that your zoom actually touches all 4 sections, but for simplicity's sake, lets say you zoom in on one of the sections). So when you zoom in to one section, you traverse the 4 children of that section. These 4 children contain higher resolution data of its parent. You can then continue to zoom down till you hit a set of leaves, which contain the highest resolution data. To make the jump from one resolution to the next "seamless" a combination of blur and fading effects can be utilized.
As a follow-up to this post, I will try to add links to many of the concepts I put in here.
For this sort of application, you would want some sort of database to represent map features and the connections between them, and would then need:
spatial indexing of the map feature database, so that it can be efficiently queried by 2D coordinates; and
a good way to search the connections to find a least-cost route, for some measure of cost (e.g. distance).
For 1, an example would be the R-tree data structure.
For 2, you need a graph search algorithm, such as A*.
Look up a paper about Highway Dimension from google authors. The idea is to precompute the shortest path between important nodes and then route everything through those. You are not going to use residential streets to go from LA to Chicago save for getting on and off the freeway at both ends.
I'm not sure of the internal data structure, but it may be some kind of 2D coordinate based tree structure that only displays a certain number of levels. The levels would correspond to zoom factors, so you could ignore as insignificant things below, say, 5 levels below the current level, and things above the current level.
Regardless of how it's structured, here's how you can use it:
http://code.google.com/apis/maps/documentation/reference.html
I would think of it as a computational geometry problem. When you click on a particular coordinate in the map and using that information, can get the latitude and longitude of that location. Based on the latitude and longitude and the level of zoom, the place can be identified.
Once you have identified the two places, the only problem for you is to identify the nearest route. Now this problem is finding the shortest path between two points, having polygon blocks between them(which correspond to the places which contains no roads) and the only possible connections are roads. This is a known problem and efficient algorithms exist to solve this.
I am not sure if this is what google is doing, but I hope they do something on these lines.
I am taking computational geometry this semester. Here is the course link: http://www.ams.sunysb.edu/~jsbm/courses/545/ams545.html. Check them if you are interested.
I was wondering what the data
structure is in an application like
google/bing maps.
To the user: XHTML/CSS/Javascript. Like any website.
On the server: who knows? Any Google devs around here? It certainly isn't PHP or ASP.net...
How is it that the results are
returned so quickly when searching for
directions?
Because Google spent years, manpower and millions of dollars on building up the architecture to get the fastest server reaction time possible?
What kind of algorithms are being used
to determine this information?
A journey planner algorithm.

Resources