Creating a polygon shape from a 2d tile array - algorithm

I have a 2D array which just contains boolean values showing if there is a tile at that point in the array or not. This works as follows, say if array[5,6] is true then there is a tile at the co-ordinate (5,6). The shape described by the array is a connected polygon possibly with holes inside of it.
Essentially all i need is a list of vertex's and faces which describe the shape in the array.
I have looked for a while and could not find a solution to this problem, any help would be appreciated.
Edit: This is all done so that i can then take the shapes and collide them together.
This project is just something i am doing to help advance my programming skills / physics etc.
Edit2: Thanks for all the help. Basicly my problem was very similar to converting a bitmap image to a vector image. http://cardhouse.com/computer/vector.htm is useful if someone else in the future encounters the same problems i have done.

Don't focus too much on the individual pixels. Focus on the corners of the pixels - the points where four pixels meet. The co-ordinates of these corners will work a lot like half-open co-ordinates of the pixels. Half-open bounds are inclusive at the lower bound, but exclusive at the upper bound, so the half-open range from one to three is {1, 2}.
Define a set of edges - single-pixel-long lines (vertical or horizontal) between two pixels. Then form an adjacency graph - two edges are adjacent if they share a point.
Next, identify connected sets of edges - the subgraphs where every point is connected, directly or indirectly, to every other. Logically, most connected subgraphs should form closed loops, and you should ensure that ALL loops are considered simple, closed loops.
One issue is the edge of your bitmap. It may simplify things if you imagine that your bitmap is a small part of an infinite bitmap, where every out-of-bounds pixel has value 0. Include pixel-edges on the edge of the bitmap based on that rule. That should ensure all loops are closed.
Also, consider those pixel-corners where you have four boundary edges - ie where the pixel pattern is one of...
1 0 0 1
0 1 1 0
The these cases, the '1' pixels should be considered part of separate polygons (otherwise you'll end up with winding-rule complications). Tweak the rules for adjacency in these cases so that you basically get two connected (right-angle) edges that happen to touch at a point, but aren't considered adjacent. They could still be connected, but only indirectly, via other edges.
Also, use additional arrays of flags to identify pixel-edges that have already been used in loops - perhaps one for horizontal edges, one for vertical. This should make it easier to find all loops without repeatedly evaluating any - you scan your main bitmap and when you spot a relevant edge, you check these arrays before scanning around to find the whole loop. As you scan around a loop, you set the relevant flags (or loop ID numbers) in these arrays.
BTW - don't think of these steps as all being literal build-this-data-structure steps, but more as layers of abstraction. The key point is to realise that you are doing graph operations. You might even use an adaptor that references your bitmap, but provides an interface suitable for some graph algorithm to use directly, as if it were using a specialised graph data structure.
To test whether a particular loop is a hole or not, pick (for example) a leftmost vertical pixel edge (on the loop). If the pixel to the right is filled, the loop is a polygon boundary. If the pixel to the left is filled, the loop is a hole. Note - a good test pixel is probably the one you found when you found the first pixel-edge, prior to tracing around the loop. This may not be the leftmost such edge, but it will be the leftmost/uppermost in that scan line.
The final issue is simplifying - spotting where pixel-edges run together into straight lines. That can probably be built into the scan that identifies a loop in the first place, but it is probably best to find a corner before starting the scan proper. That done, each line is identified using a while-I-can-continue-tracing-the-same-direction loop - but watch for those two-polygons-touching-at-a-corner issues.
Trying to combine all this stuff may sound like a complicated mess, but the trick is to separate issues into different classes/functions, e.g. using classes that provide abstracted views of underlying layers such as your bitmap. Don't worry too much about call overheads or optimisation - inline and other compiler optimisations should handle that.
What would be really interesting is the more intelligent tracing that some vector graphics programs have been doing for a while now, where you can get diagonals, curves etc.

You will have to clarify your desired result. Something like the following
██ ██
██
██ ██
given
1 0 1
0 1 0
1 0 1
as the input? If yes, this seems quite trivial - why don't you just generate one quadrilateral per array entry if there is a one, otherwise nothing?
You can solve this problem using a simple state machine. The current state consist of the current position (x,y) and the current direction - either left (L), right (R), up (U), or down (D). The 'X' is the current position and there are eight neighbors that may control the state change.
0 1 2
7 X 3
6 5 4
Then just follow the following rules - in state X,Y,D check the two given fields and change the state accordingly.
X,Y,R,23=00 => X+1,Y,D
X,Y,R,23=01 => X+1,Y,R
X,Y,R,23=10 => X+1,Y,U
X,Y,R,23=11 => X+1,Y,U
X,Y,D,56=00 => X,Y+1,L
X,Y,D,56=01 => X,Y+1,D
X,Y,D,56=10 => X,Y+1,R
X,Y,D,56=11 => X,Y+1,R
...

Related

what algorithm or approach for placing rectangles without overlapp

I have a big rectangle of size 12*12. Now I have 6 rectangles already placed on the floor of that rectangle. I know the center coordinate of that pre-placed module. Now I have few another 14 rectangles to place upon that floor of that rectangle. How to do so?
here all my pre placed block those having center coordinate as say (2,5),(5,7),(9,2),(7,8),(11,9),(3,11).
Now how could I place 14 another rectangle in this floor so that it would not over lap with any preplaced block.
I would like to code in MATLAB..but what approach should I follow?
If a nice even placement is important, I suggest you look up simulated force-based graph layout. In this problem, you'll use simulated forces pushing the rectangles apart and also away from the border rectangle according to Coulomb's law. The initial configuration is randomly selected. You'll want to give the rectangles mass proportional to their area, I think. You don't have any spring forces due to edges, which makes it easier. The iteration to solve the differential equations of motion will be easy in Matlab. Or there may well be a toolkit to do it for you. Animations of these algorithms are fun.
Unfortunately with constrained problems like this, the fixed rectangles can form barriers that prevent the moving rectangles from getting to a non-overlapping solution. (Think of the case where the fixed rectangles are in a line down the middle and all the moving ones get "trapped" on one side or the other. The same thing happens in graph layout if some nodes have fixed locations.) There are various strategies for overcoming these bad cases. One is to start with no fixed objects at all, let the moving rectangles come to an equilibrium, then add the fixed ones one at a time, largest first, allowing the system regain equilibrium each time. Another, simpler one is just to start from different random initial conditions until you find one that works. There are also approaches related to simulated annealing, which is too big a topic to discuss here.
Here is a function to check overlap for two rectangles. you could loop it to check for more number of rectangles based on #Dov's idea.
For two rectangles Ri, i = 1,2, with centers (xi,yi) and half-lengths of their sides ai,bi > 0 (assuming that the sides are aligned with the coordinate axes).
Here is my implementation based on above equation:
In my code i've taken xcPosition and ycPosition as the center position of the rectangle.
Also length and breadth are the magnitude of sides of the rectangle.
function [ overLap, pivalue ] = checkOverlap( xcPosition1,ycPosition1,xcPosition2,ycPosition2,length1,breadth1,length2,breadth2 )
pix = max((xcPosition2 - xcPosition1 -(length1/2)-(length2/2)),(xcPosition1 -xcPosition2 -(length2/2)-(length1/2)));
piy = max((ycPosition2 - ycPosition1 -(breadth1/2)-(breadth2/2)),(ycPosition1 -ycPosition2 -(breadth2/2)-(breadth1/2)));
pivalue = max(pix, piy);
if (pivalue < 0)
overLap = 1; %// Overlap exists
else
overLap = 0; %// No overlap
end
end
You could also use the pivalue to know the degree of overlap or Non-overlap
The Pseudo-code for looping would be something like this:
for i = 1 : 14
for j = 1 : i-1 + 6 already placed parts
%// check for overlap using the above function here
%// place the part if there is no overlap
end
end
With such a small number, put each rectangle in a list. Each time you add a new rectangle, make sure the new one does not overlap with any of the existing ones.
This is O(n^2), so if you plan to increase to 10^3 or more rectangles you will need a better algorithm, but otherwise you're fine.
Now if your problem specifies that you might not be able to fit them all, then you will have to backtrack and keep trying different places. That is an N! problem, but if you have a lot of open space, many solutions will be possible.

Finding Contiguous Areas of Bits in 2D Bit Array

The Problem
I have a bit array which represents a 2-dimensional "map" of "tiles". This image provides a graphical example of the bits in the bit array:
I need to determine how many contiguous "areas" of bits exist in the array. In the example above, there are two such contiguous "areas", as illustrated here:
Tiles must be located directly N, S, E or W of a tile to be considered "contiguous". Diagonally-touching tiles do not count.
What I've Got So Far
Because these bit arrays can become relatively large (several MB in size), I have intentionally avoided using any sort of recursion in my algorithm.
The pseudo-code is as follows:
LET S BE SOURCE DATA ARRAY
LET C BE ARRAY OF IDENTICAL LENGTH TO SOURCE DATA USED TO TRACK "CHECKED" BITS
FOREACH INDEX I IN S
IF C[I] THEN
CONTINUE
ELSE
SET C[I]
IF S[I] THEN
EXTRACT_AREA(S, C, I)
EXTRACT_AREA(S, C, I):
LET T BE TARGET DATA ARRAY FOR STORING BITS OF THE AREA WE'RE EXTRACTING
LET F BE STACK OF TILES TO SEARCH NEXT
PUSH I UNTO F
SET T[I]
WHILE F IS NOT EMPTY
LET X = POP FROM F
IF C[X] THEN
CONTINUE
ELSE
SET C[X]
IF S[X] THEN
PUSH TILE NORTH OF X TO F
PUSH TILE SOUTH OF X TO F
PUSH TILE WEST OF X TO F
PUSH TILE EAST OF X TO F
SET T[X]
RETURN T
What I Don't Like About My Solution
Just to run, it requires two times the memory of the bitmap array it's processing.
While extracting an "area", it uses three times the memory of the bitmap array.
Duplicates exist in the "tiles to check" stack - which seems ugly, but not worth avoiding the way I have things now.
What I'd Like To See
Better memory profile
Faster handling of large areas
Solution / Follow-Up
I re-wrote the solution to explore the edges only (per #hatchet 's suggestion).
This was very simple to implement - and eliminated the need to keep track of "visited tiles" completely.
Based on three simple rules, I can traverse the edges, track min/max x & y values, and complete when I've arrived at the start again.
Here's the demo with the three rules I used:
One approach would be a perimeter walk.
Given a starting point anywhere along the edge of the shape, remember that point.
Start the bounding box as just that point.
Walk the perimeter using a clockwise rule set - if the point used to get to the current point was above, then first look right, then down, then left to find the next point on the shape perimeter. This is kind of like the simple strategy of solving a maze where you continuously follow a wall and always bear to the right.
Each time you visit a new perimeter point, expand the bounding box if the new point is outside it (i.e. keep track of the min and max x and y.
Continue until the starting point is reached.
Cons: if the shape has lots of single pixel 'filaments', you'll be revisiting them as the walk comes back.
Pros: if the shape has large expanses of internal occupied space, you never have to visit them or record them like you would if you were recording visited pixels in a flood fill.
So, conserves space, but in some cases at the expense of time.
Edit
As is so often the case, this problem is known, named, and has multiple algorithmic solutions. The problem you described is called Minimum Bounding Rectangle. One way to solve this is using Contour Tracing. The method I described above is in that class, and is called Moore-Neighbor Tracing or Radial Sweep. The links I've included for them discuss them in detail and point out a problem I hadn't caught. Sometimes you'll revisit the start point before you have traversed the entire perimeter. If your start point is for instance somewhere along a single pixel 'filament', you will revisit it before you're done, and unless you consider this possibility, you'll stop prematurely. The website I linked to talks about ways to address this stopping problem. Other pages at that website also talk about two other algorithms: Square Tracing, and Theo Pavlidis's Algorithm. One thing to note is that these consider diagonals as contiguous, whereas you don't, but that should just something that can be handled with minor modifications to the basic algorithms.
An alternative approach to your problem is Connected-component labeling. For your needs though, this may be a more time expensive solution than you require.
Additional resource:
Moore Neighbor Contour Tracing Algorithm in C++
I actually got a question like this in an interview once.
You can pretend the array is a graph and the connected nodes are the adjacent ones. My algo would involves going 1 to the right until you find a marked node. When you find one do a breadth first search which runs in O(n) and avoids recursion. When the BFS returns keep searching from where you left off and if the node has already been marked by one of the previous BFS's you obviously don't need to search. I wasn't sure if you wanted to actually return the number of objects found, but it's easy to keep track by just incrementing a counter when you hit the first marked square.
Generally when you do a flood fill type algorithm you are placed in a spot and asked to fill. Since this is finding all the filled regions one way you would want to optimize it is to avoid rechecking the already marked nodes from previous BFS's, unfortunately at the moment I cannot think of a way to do that.
One hacky way to reduce memory consumption would be too store a short[][] instead of a boolean. Then use this scheme to avoid making a whole second 2d-array
unmarked = 0, marked = 1, checked and unmarked = 3, checked and marked = 3
This way you can check the status of an entry by its value and avoid making a second array.

How can I find hole in a 2D matrix?

I know the title seems kind of ambiguous and for this reason I've attached an image which will be helpful to understand the problem clearly. I need to find holes inside the white region. A hole is defined as one or many cells with value '0' inside the white region I mean it'll have to be fully enclosed by cell's with value '1' (e.g. here we can see three holes marked as 1, 2 and 3). I've come up with a pretty naive solution:
1. Search the whole matrix for cells with value '0'
2. Run a DFS(Flood-Fill) when such a cell (black one) is encountered and check whether we can touch the boundary of the main rectangular region
3. If we can touch boundary during DFS then it's not a hole and if we can't reach boundary then it'll be considered as a hole
Now, this solution works but I was wondering if there's any other efficient/fast solution for this problem.
Please let me know your thoughts. Thanks.
With floodfill, which you already have: run along the BORDER of your matrix and floodfill it, i.e.,
change all zeroes (black) to 2 (filled black) and ones to 3 (filled white); ignore 2 and 3's that come from an earlier floodfill.
For example with your matrix, you start from the upper left, and floodfill black a zone with area 11. Then you move right, and find a black cell that you just filled. Move right again and find a white area, very large (actually all the white in your matrix). Floodfill it. Then you move right again, another fresh black area that runs along the whole upper and right borders. Moving around, you now find two white cells that you filled earlier and skip them. And finally you find the black area along the bottom border.
Counting the number of colours you found and set might already supply the information on whethere there are holes in the matrix.
Otherwise, or to find where they are, scan the matrix: all areas you find that are still of color 0 are holes in the black. You might also have holes in the white.
Another method, sort of "arrested flood fill"
Run all around the border of the first matrix. Where you find "0", you set
to "2". Where you find "1", you set to "3".
Now run around the new inner border (those cells that touch the border you have just scanned).
Zero cells touching 2's become 2, 1 cells touching 3 become 3.
You will have to scan twice, once clockwise, once counterclockwise, checking the cells "outwards" and "before" the current cell. That is because you might find something like this:
22222222222333333
2AB11111111C
31
Cell A is actually 1. You examine its neighbours and you find 1 (but it's useless to check that since you haven't processed it yet, so you can't know if it's a 1 or should be a 3 - which is the case, by the way), 2 and 2. A 2 can't change a 1, so cell A remains 1. The same goes with cell B which is again a 1, and so on. When you arrive at cell C, you discover that it is a 1, and has a 3 neighbour, so it toggles to 3... but all the cells from A to C should now toggle.
The simplest, albeit not most efficient, way to deal with this is to scan the cells clockwise, which gives you the wrong answer (C and D are 1's, by the way)
22222222222333333
211111111DC333333
33
and then scan them again counterclockwise. Now when you arrive to cell C, it has a 3-neighbour and toggles to 3. Next you inspect cell D, whose previous-neighbour is C, which is now 3, so D toggles to 3 again. In the end you get the correct answer
22222222222333333
23333333333333333
33
and for each cell you examined two neighbours going clockwise, one going counterclockwise. Moreover, one of the neighbours is actually the cell you checked just before, so you can keep it in a ready variable and save one matrix access.
If you find that you scanned a whole border without even once toggling a single cell, you can halt the procedure. Checking this will cost you 2(W*H) operations, so it is only really worthwhile if there are lots of holes.
In at most W*H*2 steps, you should be done.
You might also want to check the Percolation Algorithm and try to adapt that one.
Make some sort of a "LinkedCells" class that will store cells that are linked with each other. Then check cells on-by-one in a from-left-to-right-from-top-to-bottom order, making the following check for each cell: if it's neighbouring cell is black - add this cell to that cell's group. Else you should create new group for this cell. You should only check for top and left neighbour.
UPD: Sorry, I forgot about merging groups: if both neighbouring cells are black and are from different groups - you should merege tha groups in one.
Your "LinkedCells" class should have a flag if it is connected to the edge. It is false by default and can be changed to true if you add edge cell to this group. In case of merging two groups you should set new flag as a || of previous flags.
In the end you will have a set of groups and each group having false connection flag will be "hole".
This algorithm will be O(x*y).
You can represent the grid as a graph with individual cells as vertexes and edges occurring between adjacent vertexes. Then you can use Breadth First Search or Depth First Search to start at each of the cells, on the sides. As you will only find the components connected to the sides, the black cells which have not been visited are the holes. You can use the search algorithm again to divide the holes into distinct components.
EDIT: Worst case complexity must be linear to the number of cells, otherwise, give some input to the algorithm, check which cells (as you're sublinear, there will be big unvisited spots) the algorithm hasn't looked into and put a hole in there. Now you've got an input for which the algorithm doesn't find one of the holes.
Your algorithm is globally Ok. It's just a matter of optimizing it by merging the flood fill exploration with the cell scanning. This will just minimize tests.
The general idea is to perform the flood fill exploration line by line while scanning the table. So you'll have multiple parallel flood fill that you have to keep track of.
The table is then processed row by row from top to bottom, and each row processed from right to left. The order is arbitrary, could be reverse if you prefer.
Let segments identify a sequence of consecutive cells with value 0 in a row. You only need the index of the first and last cell with value 0 to define a segment.
As you may guess a segment is also a flood fill in progress. So we'll add an identification number to the segments to distinguish between the different flood fills.
The nice thing of this algorithm is that you only need to keep track of segments and their identification number in row i and i-1. So that when you process row i, you have the list of segments found in the row i-1 and their associated identification number.
You then have to process segment connection in row i and row i-1. I'll explain below how this can be made efficient.
For now you have to consider three cases:
found a segment in row i not connected to a segment in row i-1. Assign it a new hole identification (incremented integer). If it's connected to the border of the table, make this number negative.
found a segment in row i-1 not connected to a segment in row i-1. You found the lowest segment of a hole. If it has a negative identification number it is connected to the border and you can ignore it. Otherwise, congratulation, you found a hole.
found a segment in row i connected to one or more segments in row i-1. Set the identification number of all these connected segments to the smallest identification number. See the following possible use case.
row i-1: 2 333 444 111
row i : **** *** ***
The segments in row i should all get the value 1 identifying the same flood fill.
Matching segments in rows i and row i-1 can be done efficiently by keeping them in order from left to right and comparing segments indexes.
Process segments by lowest start index first. Then check if it's connected to the segment with lowest start index of the other row. If no, process case 1 or 2. Otherwise continue identifying connected segments, keeping track of the smallest identification number. When no more connected segments is found, set the identification number of all connected segments found in row i to the smallest identification value.
Index comparison for connectivity test can by optimized by storing (first-1,last) as segment definition since segments may be connected by their corners. You then can directly compare indexes bare value and detect overlapping segments.
The rule to pick the smallest identification number ensures that you automatically get the negative number for connected segments and at least one connected to the border. It propagates to other segments and flood fills.
This is a nice exercise to program. You didn't specify the exact output you need. So this is also left as exercise.
The brute force algorithm as described here is as follow.
We now assume we can write in cells a value different from 0 or 1.
You need a flood fill functions receiving the coordinates of a cell to start from and an integer value to write into all connected cells holding the value 0.
Since you need to only consider holes (cells with value 0 surrounded by cells with value 1), you have to use two pass.
A first pass visit only cells touching the border. For every cell containing the value 0, you do a flood fill with the value -1. This tells you that this cell has a value different of 1 and has a connection to the border. After this scan, all cells with a value 0 belong to one or more holes.
To distinguish between different holes, you need the second scan. You then scan the remaining cells in the rectangle (1,1)x(n-2,n-2) you didn't scan yet. Whenever your scan hit a cell with value 0, you discovered a new hole. You then flood fill this hole with the integer of your choice to distinguish it from the others. After that you proceed with the scan until all cells have been visited.
When done, you may replace the values -1 with 0 because there shouldn't be any 0 left.
This algorithm works, but is not as efficient as the other algorithm I propose. Its advantage is that it's simple and doesn't need an extra data storage to hold the segments, hole identification and eventual segment chaining reference.

Merging and splitting overlapping rectangles to produce non-overlapping ones

I am looking for an algorithm as follows:
Given a set of possibly overlapping rectangles (All of which are "not rotated", can be uniformly represented as (left,top,right,bottom) tuplets, etc...), it returns a minimal set of (non-rotated) non-overlapping rectangles, that occupy the same area.
It seems simple enough at first glance, but prooves to be tricky (at least to be done efficiently).
Are there some known methods for this/ideas/pointers?
Methods for not necessarily minimal, but heuristicly small, sets, are interesting as well, so are methods that produce any valid output set at all.
Something based on a line-sweep algorithm would work, I think:
Sort all of your rectangles' min and max x coordinates into an array, as "start-rectangle" and "end-rectangle" events
Step through the array, adding each new rectangle encountered (start-event) into a current set
Simultaneously, maintain a set of "non-overlapping rectangles" that will be your output set
Any time you encounter a new rectangle you can check whether it's completely contained already in the current / output set (simple comparisons of y-coordinates will suffice)
If it isn't, add a new rectangle to your output set, with y-coordinates set to the part of the new rectangle that isn't already covered.
Any time you hit a rectangle end-event, stop any rectangles in your output set that aren't covering anything anymore.
I'm not completely sure this covers everything, but I think with some tweaking it should get the job done. Or at least give you some ideas... :)
So, if I were trying to do this, the first thing I'd do is come up with a unified grid space. Find all unique x and y coordinates, and create a mapping to an index space. So if you have x values { -1, 1.5, 3.1 } then map those to { 0, 1, 2 }, and likewise for y. Then every rectangle can be exactly represented with these packed integer coordinates.
Then I'd allocate a bitvector or something that covers the entire grid, and rasterize your rectangles in the grid. The nice thing about having a grid is that it's really easy to work with, and by limiting it to unique x and y coordinates it's minimal and exact.
One way to come up with a pretty quick solution is just dump every 'pixel' of your grid.. run them back through your mapping, and you're done. If you're looking for a more optimal number of rectangles, then you've got some sort of search problem on your hands.
Let's look at 4 neighboring pixels, a little 2x2 square. When I write algorithms like these, typically I think in terms of verts, edges, and faces. So, these are the faces around a vert. If 3 of them are on and 1 is off, then you've got a concave corner. This is the only invalid case. For example, if I don't have any concave corners, I just grab the extents and dump the whole thing as a single rectangle. For each concave corner, you need to decide whether to split horizontally, vertically, or both. I think of the splitting as marking edges not to cross when finding extents. You could also do it as coloring into sets, whatever is easier for you.
The concave corners and their split directions are your search space.. you can use whatever optimization algorithm you'd like. Branch/bound might work well. I bet you could find a simple heuristic that performs well (for example, if there's another concave corner directly across from the one you're considering, always split in that direction. Otherwise, split in the shorter direction). You could just go greedy. Or you could just split every concave vert in both directions, which would generally give you fewer rectangles than outputting every 'pixel' as a rect, and would be pretty simple.
Reading over this I realize that there may be areas that are unclear. Let me know if you want me to clarify anything.

What is the efficient Algorithm for Solving Jigsaw Puzzle?

Yesterday I was just playing Jigsaw Puzzle and somehow wondered what would be algorithm for solving it.
As human, steps which I followed where:
Separate all pieces in 3 parts, single flat edge, double flat edge and no edge at all.
Separate flat edge pieces as they would be corners of image
Separate single edge pieces as they would form 4 end edges of images
Lastly, pieces with no edges would form internal of the image.
Match the color and image pieces to put pieces together.
I was wondering what would be the efficient algorithm to solve this puzzle efficiently and what datastructure would provide optimum efficient solution.
Thanks.
Solving problems like this can be deceptively complicated, especially if no constraints are placed on the size and complexity of the puzzle.
Here's my thoughts on an approach to writing a program to solve such a puzzle.
There are four key pieces of information that you can use individually and together as clues to solving a jigsaw puzzle:
The shape information of each of the pieces (how their edges appear)
The color information of each of the pieces (adjacent pieces will generally have smooth transitions)
The orientation information of each piece (where flat and corner edges may lie)
The overall size and number of pieces provide the general dimensions of the puzzle
So what kind of information will the program will be supplied - let's assume that each puzzle piece is an small rectangular image with transparency information used to identify the portion of the puzzle piece that represent non-rectangular edges.
From this, it is relatively easy to identify the four corner pieces (in a typical jigsaw). These would have exactly two edges that have flat contours (see contour map below).
Next, I would build information about the shape of each of the four edges of a puzzle piece. This information can be used to build an adjacency matrix indicating which pieces fit together.
Now we can prune this adjacency matrix to identify just those pieces that have smooth color transitions in their adjacent configuration. This is somewhat tricky because it requires a level of fuzzy matching - since not every pixel-to-pixel boundary will necessarily have a smooth color transition.
Using the four corner pieces originally identified, we should now be able to reconstruct the dimensions and positions of all of the pieces in the puzzle.
A convenient data structure for representing edge shapes may be a contour map - essentially a set of integers representing the incremental deltas in distance from the opposing side of the image to the last non-transparent pixel in each of the four sides of the puzzle piece. Pieces that match should have mirror-image contour maps.
Make sure to scan for male/female portions of a piece--this will cut the search in half.
Assuming you're not going to get into any computer vision stuff, it would be very small variations on a search of the entire problem space, i.e. trying every piece until one fits, and repeating. The major optimization would be not trying the same piece in the same place if you know it doesn't fit. Side/corner pieces make up relatively few of the pieces and probably couldn't be considered in any major optimization.
The data structure would probably be something like a hash matrix, where you could quickly check if you're already tried a piece in a position.
An easy optimization that includes computer vision would be to try pieces at each position after sorting pieces by how close their average color matches adjacent positions.
This just off the top of my head of course.
I don't think that the human way would be that helpful for an implementation - a computer can look at all pieces many times a second and I see no (big) win by categorizing the pieces into corner, edge, and inner pieces, especially because there are only three categories and they have very different sizes.
Given a set of images of all pieces I would try to derive a simple descriptor for every piece or edge. The descriptor must contain information about the rough shape and the color of the piece respectively the four edges. Given a puzzle with 1000 pieces, there are 4000 edges and always two must be equal (ignoring the border of the puzzle). In consequence the descriptor must be able to distinguish 2000 edges requiring at least 11 bits.
Dividing one piece into a 3 x 3 check board pattern with nine fields will give three colors per edge - with eight bits per channel we already have 72 bits. I first tended to suggest to reduce the color resolution, but this seems not to be a good idea - for example a blue sky might really benefit from a high color resolution. Note that calculating the colors probably requires separating the piece from the background and trying to align the edges to the horizontal and vertical axises.
In very uniform areas like blue skies the color information will probably not be enough to find good matches and additional geometric information will be required. I would try to describe the shape of the edge by its curvature or a derived measure. Maybe sampled at ten to twenty points per edge. This probably again relies on background separation and edge alignment.
Finally the computer can do the easy part - compare all pairs of edge descriptors and find the best matches. This process should probably be controlled to become more local instead of simple best match first because when ever you have found a corner (Correct English word? I mean three pieces in a L-shape.) you have two edges constraining the piece to find and one can track back early if no good match can be found (indicating an error made before or a hard puzzle).
Passing over this I thought of an interesting solution which solves it at increasing costs over a series of steps.
Separate all puzzle pieces into sets of two. Test to see if they fit together. If not, try a different piece it hasn't seen before. If it does, put the set into a correct pile. Repeat until all sets of two has found a match.
From the correct pile combine the set of twos to make a set with sets of twos i.e {{1,2},{5,6}}. See if at least one puzzle piece from one set of two fits with at least another puzzle piece from the other set of two. If not, try a different set of two it hasn't seen before. If it does, combine the two sets into one set of four in the correct orientation with the piece you found to fit together and put the combined set into a correct pile. Repeat until all sets of four has been found.
Repeat the steps until the final problem where set n/2 is combined with set n/2.
Not positive what the computation time for this would be.

Resources