Segmenting a double array of labels - algorithm

The Problem:
I have a large double (2d) array populated with various labels. Each element (cell) in the double array contains a set of labels and some elements in the double array may be empty. I need an algorithm to cluster elements in the double array into discrete segments. A segment is defined as a set of pixels that are adjacent within the double array and one label that all those pixels in the segment have in common. (Diagonal adjacency doesn't count and I'm not clustering empty cells).
|-------|-------|-------|
| Jane | Joe | |
| Jack | Jane | |
|-------|-------|-------|
| Jane | Jane | |
| | Joe | |
|-------|-------|-------|
| | Jack | Jane |
| | Joe | |
|-------|-------|-------|
In the above arrangement of labels distributed over nine elements, the largest cluster is the “Jane” cluster occupying the four upper left cells.
What I've Considered:
I've considered iterating through every label of every cell in the double array and testing to see if the cell-label combination under inspection can be associated with a preexisting segment. If the element under inspection cannot be associated with a preexisting segment it becomes the first member of a new segment. If the label/cell combination can be associated with a preexisting segment it associates.
Of course, to make this method reasonable I'd have to implement an elaborate hashing system. I'd have to keep track of all the cell-label combinations that stand adjacent to preexisting segments and are in the path of the incrementing indices that are iterating through the double array. This hash method would avoid having to iterate through every pixel in every preexisting segment to find an adjacency.
Why I Don't Like it:
As is, the above algorithm doesn't take into consideration the case where an element in the double array can be associated with two unique segments, one in the horizontal direction and one in the vertical direction. To handle these cases properly, I would need to implement a test for this specific case and then implement a method that will both associate the element under inspection with a segment and then concatenate the two adjacent identical segments.
On the whole, this method and the intricate hashing system that it would require feels very inelegant. Additionally, I really only care about finding the large segments in the double array and I'm much more concerned with the speed of this algorithm than with the accuracy of the segmentation, so I'm looking for a better way.
I assume there is some stochastic method for doing this that I haven't thought of.
Any suggestions?
Edit:
My desired output is a list of segments, each segment being a label and a list of points. Therefore, in the example above I'd want two segments to be returned:
Segment 1 - Jane: (1,3), (2,3), (1,2), (2,2)
Segment 2 - Joe: (2,3), (2,2), (2,1)

You basically want to implement a flood fill algorithm--consider the array as a set of images, one per distinct label, where the label is a color, and the lack of a label is black; you then want to segment it into all connected components of that color.
Repeat for all labels and you're done.
If your labels are sparse, you're probably better off not actually creating an image for each label and using an existing flood fill routine. In that case, do your own breadth-first flood fill by creating a copy of the array and building connected blocks one label at a time while destroying the existing label.
I am going to call one entry a "pixel" and the whole array an "image".
The algorithm goes, roughly,
for each pixel in the image
for each label in the pixel
1. remove the label
2. mark the current pixel
3. for each marked pixel, look in every adjacent pixel for the label
4. remove any labels found
5. if labels are found, clear marks, and mark the newly label-removed pixels
6. if anything is marked, go back to 3
7. report the set of points where you removed labels
Since this is destructive, you don't have to worry about backtracking. (If you can't destroy your original, and can't make a copy, then you have to keep track of what you've done along the way, which is more of a hassle.)

Related

Is there an algorithm to dynamical generate a maze, ensuring that there is always more places to go?

Let's suppose we have a maze. You start somewhere in it:
* - * - *
| |
*-here
Only a small part of the maze is generated (for example, a 10 by 10 square around you). As you move around, more of the maze is generated. Is there an algorithm that ensures that there is always a place for you to go?
For example:
*-here * - *
|
*
would not work because you have no paths.
I have a 'solution' for it, and that is to generate a finite maze, then force it to be connected to another finite maze, forming a mesh (ensuring that finite mazes are doable is easy).
Edit 1: The maze can't have a deterministic size; Parts of the map will be generated dynamically.
Edit 2: It has to generate the same maze no matter the order which you load it in (Moving up then left should generate the same maze as moving left then up)
My maze does not need to include all places
example image:
I like to generate mazes with variants of Kruskal's algorihtm:
http://weblog.jamisbuck.org/2011/1/3/maze-generation-kruskal-s-algorithm
https://mtimmerm.github.io/webStuff/maze.html
One way to think of using Kruskal's algorithm to generate mazes is:
Assign a random weight to every possible wall
Remove every wall if the cells on ether side of it are not connected by removing all walls of lesser weight.
If you divide the world into tiles, then you can turn this into an algorithm that you can evaluate locally as follows:
Assign a random weight to every possible wall. Use a different random number generator for the walls in each tile, and seed it with the tile's coordinates.
Remove every wall if the cells on either side of it are not connected by removing all walls of lesser weight in its tile and adjacent tiles.
This way, to generate any tile of the maze, you only need to consider the weights that would be assigned to walls in the 8 adjacent tiles. The procedure is just like doing the normal Kruskal's to make a 3 tile X 3 tile maze, and then cutting out the middle tile.
When you generate mazes this way, it's guaranteed that there will be a path from every place in the maze to every other place. Unlike "perfect" mazes, however, there may be more than one path between two places that are more than a tile apart. As long as your tiles are sufficiently large, though, there won't be any visible artifacts of the tiling, and it will still be difficult to find your way from one place to another.
There are many, of varying complexities. A good place to start is the Wikipedia page: https://en.wikipedia.org/wiki/Maze_generation_algorithm.
Often, it'll be easier to generate the whole maze in advance and reveal it bit by bit as it is explored than to incrementally generate the maze during exploration, but take a look at the link and decide what you think.
Also, if the maze doesn't completely fill space, you might look at the way old games like hack, nethack, or rogue generate room layouts on levels. I'm sorry I don't have a reference for how that was done.
Maybe you can use Wilson's algorithm to solve your problem. This algorithm grows the "maze" (spanning tree) by adding a loop-erased "random walk" from some point outside of the existing maze until the random walk meets the existing maze. So if you want to grow your existing maze, you could add some empty region (only nodes, no edges) to your existing graph, select one vertex from that region and run a random walk until the existing maze is met which will grow your maze.
Here's a suggestion of something that doesn't entirely meet your objective, but it comes close. Where it's lacking is that you also have to compute a guaranteed solution, which will override what follows. So what follows is deterministic for the rest of the maze.
The type of maze being considered uses a square grid. Each cell is a square. There are 16 types of cells. Think of each wall as a binary digit. If the wall is missing, it's a 1. If it's present, it's a 0. Using a seed calculated from the coordinates of the cell, compute a random number from 0 to 15 and assign it to the cell. Each cell now has deterministically been assigned a random number.
Neighboring cells are possibly incompatible, though. So simply use a rule to tweak the values. For every pair of adjoining cells, take the one with the lower coordinate as the truth (the other coordinate is the same, since we're talking about a square grid). Adjust the wall value of the other one to match.
That creates a maze without a guaranteed path. So simply modify it with the path from the calculated solution. Let's number the edges clockwise from the top, assigning the top to the most significant bit. I'll fill in the space with a perpendicular line to show where a path would be, rather than leaving it blank as in the question.
Example cells:
* - *
| | 0 cell
* - *
* - *
- | 1 cell
* - *
* - *
| | 2 cell
* | *
* - *
- | 3 cell
* | *
etc.
So now, suppose the cell at (5,4) as a type 6 and its neighbor to the right (6,4) is a type 12:
* - * * | *
| - | -
* | * * - *
These conflict with each other. The one on the left wants the path. The one on the right wants a wall. This is resolved by looking at the coordinates. (5,4) < (6,4) so the left one is given precedence. It wants a path, so the other cell is modified by applying a bitwise or with 1:
* - * * | *
| - - -
* | * * - *
So the new configuration of the cell to the right is type 13 (12 OR 1).
The two cells are then compared to the precomputed solution path and tweaked, if needed. If the precomputed solution does not pass through either of these cells, we are finished.

Map some points to polygons

I have the following map in my game,
x-------------
y +3 +5 +2 -1 -2
| -1 +4 +2 +1 -1
| -2 +1 -1 -1 -1
| -1 -1 -2 -1 +2
| +2 +2 +2 +2 +4
I want to draw positive points as polygons, like the following:
How can I do that, what algorithm/way I should use?
You haven't told us what you need the polygons for, so I'll just assume that the polygons should visualise the islands and nothing else.
Your values seem to be a height map. where heights below zero are below sea level. You can then use this information to draw a piecewise height line for the sea level: Subdivide you map into squares where the center of your map tiles are the corner points. For any edges of these squares where one end point is above sea level and the other below, you find the point where the height is zero with linear interpolation. You have then either no, two or four such points.
If you have two points, connect them with a line. If you have four points, find out which pairs belong together by looking at the adjacent tiles and connect them. You should end up with a contour plot of your islands:
This is your example with an artificial border of height -1 added. Because the height value isn't just taken as island/sea criterion but as physical height value, an uneven look of the shore is created. If you want your islands to look more regular, you can treat all negative heights as -1 and all positive ones as 1. You will then get a more regular look, where all points are in the middle of the squares' edges:
Note that this method does not create polygons (so my answer might miss the point entirely), because the lines are not connected. You can improve the algorithm by finding intersections for all vertical and horizontal sections of the auxiliary grid and then wander along the squares to build closed polygons, which you then may simplify as you wish.
You example is incorrect.
Algorithm can't produce different results using independent decisions on every step.
But I understood your target and possible solution is:
1.save groups of positive cells in different arrays using BFS (or floodfill, I see that there is some kind of fasion to call same things by different names). You can use negative values as border.
N,M - size of your map
board[][] - your map.
visited[][] - bool array. bfs() mark cells as visited after adding them into some group
groups[] - dynamic array with some structures (mb another arrays) to keep groups of cells
groupsCount = 0;
for row = 1..N do
for col = 1..M do
if( board[row][col] > 0 && !visited[row][col] ) {
bfs(row,col,board,visited,groups[groupsCount]);
++groupsCount;
}
2.1. use O(N^2) or O(N logN) convex hull, and you'll find the envelope with minimal perimeter for every group (just like you pulled an elastic band on a several nails).
2.2. minimal envelope may not solve you task in proper way, for example
when you want to see smth like this
In this case the possible solution is to mark positive cells adjacent to negative and run DFS with some kind of special priority. For example - "choose the closest unvisited cell". It will produce result shown above (but I'm not sure about all cases).
This is an instance of the contour tracing problem. http://www.imageprocessingplace.com/downloads_V3/root_downloads/tutorials/contour_tracing_Abeer_George_Ghuneim/alg.html
You should scan your map row by row until you find a positive value. Then use the contour tracing algorithm, joining the outline vertices, and mark them as having been visited. Then continue the scan until you find a non-positive value (to exit of the current island), and then an unmarked positive value (the next island). And so on...

How can I find hole in a 2D matrix?

I know the title seems kind of ambiguous and for this reason I've attached an image which will be helpful to understand the problem clearly. I need to find holes inside the white region. A hole is defined as one or many cells with value '0' inside the white region I mean it'll have to be fully enclosed by cell's with value '1' (e.g. here we can see three holes marked as 1, 2 and 3). I've come up with a pretty naive solution:
1. Search the whole matrix for cells with value '0'
2. Run a DFS(Flood-Fill) when such a cell (black one) is encountered and check whether we can touch the boundary of the main rectangular region
3. If we can touch boundary during DFS then it's not a hole and if we can't reach boundary then it'll be considered as a hole
Now, this solution works but I was wondering if there's any other efficient/fast solution for this problem.
Please let me know your thoughts. Thanks.
With floodfill, which you already have: run along the BORDER of your matrix and floodfill it, i.e.,
change all zeroes (black) to 2 (filled black) and ones to 3 (filled white); ignore 2 and 3's that come from an earlier floodfill.
For example with your matrix, you start from the upper left, and floodfill black a zone with area 11. Then you move right, and find a black cell that you just filled. Move right again and find a white area, very large (actually all the white in your matrix). Floodfill it. Then you move right again, another fresh black area that runs along the whole upper and right borders. Moving around, you now find two white cells that you filled earlier and skip them. And finally you find the black area along the bottom border.
Counting the number of colours you found and set might already supply the information on whethere there are holes in the matrix.
Otherwise, or to find where they are, scan the matrix: all areas you find that are still of color 0 are holes in the black. You might also have holes in the white.
Another method, sort of "arrested flood fill"
Run all around the border of the first matrix. Where you find "0", you set
to "2". Where you find "1", you set to "3".
Now run around the new inner border (those cells that touch the border you have just scanned).
Zero cells touching 2's become 2, 1 cells touching 3 become 3.
You will have to scan twice, once clockwise, once counterclockwise, checking the cells "outwards" and "before" the current cell. That is because you might find something like this:
22222222222333333
2AB11111111C
31
Cell A is actually 1. You examine its neighbours and you find 1 (but it's useless to check that since you haven't processed it yet, so you can't know if it's a 1 or should be a 3 - which is the case, by the way), 2 and 2. A 2 can't change a 1, so cell A remains 1. The same goes with cell B which is again a 1, and so on. When you arrive at cell C, you discover that it is a 1, and has a 3 neighbour, so it toggles to 3... but all the cells from A to C should now toggle.
The simplest, albeit not most efficient, way to deal with this is to scan the cells clockwise, which gives you the wrong answer (C and D are 1's, by the way)
22222222222333333
211111111DC333333
33
and then scan them again counterclockwise. Now when you arrive to cell C, it has a 3-neighbour and toggles to 3. Next you inspect cell D, whose previous-neighbour is C, which is now 3, so D toggles to 3 again. In the end you get the correct answer
22222222222333333
23333333333333333
33
and for each cell you examined two neighbours going clockwise, one going counterclockwise. Moreover, one of the neighbours is actually the cell you checked just before, so you can keep it in a ready variable and save one matrix access.
If you find that you scanned a whole border without even once toggling a single cell, you can halt the procedure. Checking this will cost you 2(W*H) operations, so it is only really worthwhile if there are lots of holes.
In at most W*H*2 steps, you should be done.
You might also want to check the Percolation Algorithm and try to adapt that one.
Make some sort of a "LinkedCells" class that will store cells that are linked with each other. Then check cells on-by-one in a from-left-to-right-from-top-to-bottom order, making the following check for each cell: if it's neighbouring cell is black - add this cell to that cell's group. Else you should create new group for this cell. You should only check for top and left neighbour.
UPD: Sorry, I forgot about merging groups: if both neighbouring cells are black and are from different groups - you should merege tha groups in one.
Your "LinkedCells" class should have a flag if it is connected to the edge. It is false by default and can be changed to true if you add edge cell to this group. In case of merging two groups you should set new flag as a || of previous flags.
In the end you will have a set of groups and each group having false connection flag will be "hole".
This algorithm will be O(x*y).
You can represent the grid as a graph with individual cells as vertexes and edges occurring between adjacent vertexes. Then you can use Breadth First Search or Depth First Search to start at each of the cells, on the sides. As you will only find the components connected to the sides, the black cells which have not been visited are the holes. You can use the search algorithm again to divide the holes into distinct components.
EDIT: Worst case complexity must be linear to the number of cells, otherwise, give some input to the algorithm, check which cells (as you're sublinear, there will be big unvisited spots) the algorithm hasn't looked into and put a hole in there. Now you've got an input for which the algorithm doesn't find one of the holes.
Your algorithm is globally Ok. It's just a matter of optimizing it by merging the flood fill exploration with the cell scanning. This will just minimize tests.
The general idea is to perform the flood fill exploration line by line while scanning the table. So you'll have multiple parallel flood fill that you have to keep track of.
The table is then processed row by row from top to bottom, and each row processed from right to left. The order is arbitrary, could be reverse if you prefer.
Let segments identify a sequence of consecutive cells with value 0 in a row. You only need the index of the first and last cell with value 0 to define a segment.
As you may guess a segment is also a flood fill in progress. So we'll add an identification number to the segments to distinguish between the different flood fills.
The nice thing of this algorithm is that you only need to keep track of segments and their identification number in row i and i-1. So that when you process row i, you have the list of segments found in the row i-1 and their associated identification number.
You then have to process segment connection in row i and row i-1. I'll explain below how this can be made efficient.
For now you have to consider three cases:
found a segment in row i not connected to a segment in row i-1. Assign it a new hole identification (incremented integer). If it's connected to the border of the table, make this number negative.
found a segment in row i-1 not connected to a segment in row i-1. You found the lowest segment of a hole. If it has a negative identification number it is connected to the border and you can ignore it. Otherwise, congratulation, you found a hole.
found a segment in row i connected to one or more segments in row i-1. Set the identification number of all these connected segments to the smallest identification number. See the following possible use case.
row i-1: 2 333 444 111
row i : **** *** ***
The segments in row i should all get the value 1 identifying the same flood fill.
Matching segments in rows i and row i-1 can be done efficiently by keeping them in order from left to right and comparing segments indexes.
Process segments by lowest start index first. Then check if it's connected to the segment with lowest start index of the other row. If no, process case 1 or 2. Otherwise continue identifying connected segments, keeping track of the smallest identification number. When no more connected segments is found, set the identification number of all connected segments found in row i to the smallest identification value.
Index comparison for connectivity test can by optimized by storing (first-1,last) as segment definition since segments may be connected by their corners. You then can directly compare indexes bare value and detect overlapping segments.
The rule to pick the smallest identification number ensures that you automatically get the negative number for connected segments and at least one connected to the border. It propagates to other segments and flood fills.
This is a nice exercise to program. You didn't specify the exact output you need. So this is also left as exercise.
The brute force algorithm as described here is as follow.
We now assume we can write in cells a value different from 0 or 1.
You need a flood fill functions receiving the coordinates of a cell to start from and an integer value to write into all connected cells holding the value 0.
Since you need to only consider holes (cells with value 0 surrounded by cells with value 1), you have to use two pass.
A first pass visit only cells touching the border. For every cell containing the value 0, you do a flood fill with the value -1. This tells you that this cell has a value different of 1 and has a connection to the border. After this scan, all cells with a value 0 belong to one or more holes.
To distinguish between different holes, you need the second scan. You then scan the remaining cells in the rectangle (1,1)x(n-2,n-2) you didn't scan yet. Whenever your scan hit a cell with value 0, you discovered a new hole. You then flood fill this hole with the integer of your choice to distinguish it from the others. After that you proceed with the scan until all cells have been visited.
When done, you may replace the values -1 with 0 because there shouldn't be any 0 left.
This algorithm works, but is not as efficient as the other algorithm I propose. Its advantage is that it's simple and doesn't need an extra data storage to hold the segments, hole identification and eventual segment chaining reference.

Data structure to query points which lie inside a triangle

I have some 2D data which contains edges which were rasterized into pixels. I want to implement an efficient data structure which returns all edge pixels which lie in a non-axis-aligned 2D triangle.
The image shows a visualization of the problem where white denotes the rasterized edges, and red visualizes the query triangle. The result would be all white pixels which lie on the boundary or inside the red triangle.
When further looking at the image, one notices that we have sparse boolean data, meaning that if we denote black pixels with a 0 and white pixels with a 1, that the number of 1s in the data is much lower than the number of 0s. Therefore, rasterizing the red triangle and checking for each point on it's inside whether it is white or black is not the most efficient approach.
Besides the sparseness of the data; since the white pixels origin from edges, it is in their nature to be connected together. However, at junctions with other lines, they have more than two neighbors. Pixels which are at a junction should only be returned once.
The data must be processed in realtime, but with no GPU assistance. There will be multiple queries for different triangle contents, and after each one, points may be removed from the data structure. However, new points won't be inserted anymore after the initial filling of the data structure.
The query triangles are already known when the rasterized edges arrive.
There are more query triangles than data edges.
There are many spatial data structures available. However I'm wondering, which one is the best one for my problem. I'm willing to implement a highly optimized data structure to solve this problem, as it will be a core element of the project. Therefore, also mixes or abbreviations of data structures are welcome!
R-trees seem to be the best data structure which I found for this problem until now as they provide support for rectangle-based queries. I would check for all white pixels within an AABB of the query triangle, then would check for each returned pixel if it lies within the query rectangle.
However, I'm not sure how well R-trees will behave since edge-based data will not be easily groupable into rectangles, as the points are clumped together on narrow lines and not pread out.
I'm alo not sure if it would make sense to pre-build the structure of the R-tree using information about the query triangles which will be made as soon as the structure is filled (as mentioned before, the query triangles are already known when the data arrives).
Reversing the problem seems also to be a valid solution, where I use a 2-dimensional interval tree to get for each white pixel a list of all triangles which contain it. Then, it can already be stored within all those result sets and be returned instantly when the query arrives. However, I'm not sure how this performs a the number of triangles is higher than the number of edges, but still lower than the number of white pixels (as an edge is mostly split up into ~20-50 pixels).
A data structure which would exploit that white pixels have most often white pixels as neighbors would seem to be most efficient. However, I could not find anything about such a thing until now.
Decompose the query triangle(s) into n*3 lines. For every point under test you can estimate at which side of every line it is. The rest is boolean logic.
EDIT: since your points are rasterised, you could precompute the points on the scanlines where the scanline enters or leaves a particular query triangle (=crosses one of the 3n lines above && is on the "inside" of the other two lines that participate in that particular triangle)
UPDATE: Triggered by another topic ( How can I find out if point is within a triangle in 3D? ) I'll add code to prove that a non-convex case can be expressed en terms of "which side of every line a point is on". Since I am lazy, I'll use an L-shaped form. IMHO other Non-convex shapes can be processed similarly. The lines are parallel to the X- and Y- axes, but that again is laziness.
/*
Y
| +-+
| | |
| | +-+
| | |
| +---+
|
0------ X
the line pieces:
Horizontal:
(x0,y0) - (x2,y0)
(x1,y1) - (x2,y1)
(x0,y2) - (x1,y2)
Vertical:
(x0,y0) - (x0,y2)
(x1,y1) - (x1,y2)
(x2,y0) - (x2,y1)
The lines:
(x==x0)
(x==x1)
(x==x2)
(y==y0)
(y==y1)
(x==y2)
Combine them:
**/
#define x0 2
#define x1 4
#define x2 6
#define y0 2
#define y1 4
#define y2 6
#include <stdio.h>
int inside(int x, int y)
{
switch( (x<x0 ?0:1)
+(x<x1 ?0:2)
+(x<x2 ?0:4)
+(y<y0 ?0:8)
+(y<y1 ?0:16)
+(y<y2 ?0:32) ) {
case 1+8:
case 1+2+8:
case 1+8+16:
return 1;
default: return 0;
}
}
int main(void)
{
int xx,yy,res;
while (1) {
res = scanf("%d %d", &xx, &yy);
if (res < 2) continue;
res = inside(xx, yy);
printf("(%d,%d) := %d\n", xx, yy,res);
}
return 0;
}
There are a couple computational-geometric algorithms that I think in tandem would give good results.
Compute a planar subdivision that contains all of the triangle edges. (This is a little more complicated than computing all intersections of triangle edges.) For each face, make a list of the triangles that contain that face. This is admittedly worst-case cubic, but that's only when the triangles overlap a lot (and I can't help but think that there's a way to compress it to quadratic).
Locate each pixel in the subdivision (i.e., figure out which face it belongs to). The first one in each edge will cost O(log n), but if you have locality thereafter, there may be a way to shortcut the computation to something like O(1) on average. (For example, if you use the trapezoid method and if you store the list of trapezoids that contained the last point, you can traverse up the list until you find a trapezoid that contains the current point and work back down. Compare giving hints to C++ STL set insertion by passing an iterator near the insertion point.)

How to draw a tree structure? (A two-dimensional space allocation tree recursion algorithm?)

I have an arbitrary tree structure of nodes. I want to draw this tree to provide users a visual representation. I need to recurse over the tree and for each node add a graphic item to a list, and then just draw the list of items once tree recursion has finished. The recursion and drawing of items is of course trivial - what's a bit more complicated is how to position the graphic nodes so they do not overlap with other branches.
I'm using Android but that is not important - I'm looking for an approach, possibly an algorithm that can maintain a picture of 2D space as it passes over the tree so it just allocates the most appropriate coordinates for each node as it makes the pass.
Any ideas?
Update
This is the article with the best and most complete algorithm.
I would try the Walker algorithm. Here's an academic paper on the algorithm. If you want code to look at, look at the NodeLinkTreeLayout in Prefuse. Prefuse is open source so there shouldn't be any problems adapting the code to your situation as long as you follow the terms of the license.
I suggest drawing the tree linewise. You do this by using some kind of moving "drawing cursor".
You could store an attribute width for each node which is calculated as follows:
the width of a leave is 1
the width of an inner node is the sum of all childrens' widths
Then, you draw the root "in the first line" in the middle, which means, you just take root's width's half.
Then, you generate a grid over the image such that each gridline corresponds to one line resp. one step from left to right and each intersection of grid lines can contain a node and each node has enough space.
Then, you iterate through the childs and while iterating, you accumulate the children's widths and draw the children "in the next line". To draw currentChild, you move your drawing cursor currentWidth/2 to the right, draw currentChild, and move the drawing cursor the remaining currentWidth/2 to the right.
In order to get the nodes in a good order, you might consider a breadth first search.
I hope my explanation is clear, but I think it will be better, if I draw a little picture.
This is our tree (x are nodes, everything else edges)
+-------x--+-------+
| | |
+-x-+ +-+x+-+ +-x-+
| | | | | | | | |
x x x x x x x x x
So, you calculate the leaf's widths:
+-------x--+-------+
| | |
+-x-+ +-+x+-+ +-x-+
| | | | | | | | |
1 1 1 1 1 1 1 1 1
Then, bottom up, the widths as sums of childrens' widths:
+-------9--+-------+
| | |
+-2-+ +-+4+-+ +-3-+
| | | | | | | | |
1 1 1 1 1 1 1 1 1
So, you start at the root (width 9) and go 4.5 steps to the rigt in the first line.
Then, you move your "drawing cursor" to the second line, "column 0" (go to left).
The first child has width 2, so we go 2/2=1 grid lines to the right and draw the node and move the drawing cursor the remaining 1 grid lines to the right in order to finish the node. So, the next node has width 4, which means, that we go right 4/2=2 grid lines, draw, go the remaining 2 steps, and so on.
And so on with the next line. At the end (or in intermediate steps), connect the nodes.
This procedure ensures that there are no overlapping nodes (if grid lines are far enough from each other), but it might lead to quite large tree diagrams that could use the space more efficiently.
In order to detect unused space, one might just scan the lines after the above process and look if there are unused grid line intersections and then possibly realign some nodes in order to fill space.
Take a look at Dot. You can convert your tree to the dot representation and then using Graphviz visualize in any format you like. For example Doxygen uses it to represent the structure of program.
Graphviz and mfgraph are powerful, but they're for general graphs and are probably overkill for trees.
Try googling on tree+layout+algorithm or see Graphic Javascript Tree with Layout.
The latter is old but it uses HTML canvas and javascript, and it explains the code, so both the code and the approach should be portable.
Depending on the nature of your data, a TreeMap may be more appropriate than a tinkertoy representation.

Resources