How would you represent a Rubik's Cube in code? - data-structures

If you were developing software to solve a Rubik's Cube, how would you represent the cube?

This ACM Paper describes several alternative ways that it has used to represent a rubik's cube and compares them against eachother. Sadly, I don't have an account to get the full text but the description states:
Seven alternative representations of Rubik's Cube are presented and compared: a 3-by-3-by-3 array of 3-digit integers; a 6-by-3-by-3 array of literals; a 5-by-12 literal matrix; an ll-by-ll sparse literal matrix; a 54-element vector; a 4-dimension array; and a 3-by-3-by-3 nested array. APL functions are given for orientation moves and quarter-turns plus several useful tools for solving the cube.
Also, this RubiksCube.java file contains a pretty clean representation along with the relevant code for rotating the sections (if you are looking for actual code). It uses a cell and faces array.

The short answer is that it depends on how you're going to solve the cube. If your solver is going to use a human method like the layer-by-layer approach or the Fridrich method then the underlying data structure won't make much of a difference. A computer can solve a cube using a human method in negligible time (well under a second) even in the slowest of programming languages. But if you are going to solve the cube using a more computationally intensive method such as Thistlethwaite's 52-move algorithm, Reid's 29-move algorithm, or Korf's 20-move algorithm, then the data structure and programming language are of utmost importance.
I implemented a Rubik's Cube program that renders the cube using OpenGL, and it has two different types of solvers built in (Thistlethwaite and Korf). The solver has to generate billions of moves and compare each cube state billions of times, so the underlying structure has to be fast. I tried the following structures:
A three-dimensional array of chars, 6x3x3. The color of a face is indexed like cube[SIDE][ROW][COL]. This was intuitive, but slow.
A single array of 54 chars. This is faster than (1), and the row and stride are calculated manually (trivial).
6 64-bit integers. This method is essentially a bitboard, and is significantly faster than methods (1) and (2). Twisting can be done using bit-wise operations, and face comparisons can be done using masks and 64-bit integer comparison.
An array of corner cubies and a separate array of edge cubies. The elements of each array contain a cubie index (0-11 for edges; 0-7 for corners) and an orientation (0 or 1 for edges; 0, 1, or 2 for corners). This is ideal when your solver involves pattern databases.
Expanding on method (3) above, each face of the cube is made up of 9 stickers, but the center is stationary so only 8 need to be stored. And there are 6 colors, so each color fits in a byte. Given these color definitions:
enum class COLOR : uchar {WHITE, GREEN, RED, BLUE, ORANGE, YELLOW};
A face might look like this, stored in a single 64-bit integer:
00000000 00000001 00000010 00000011 00000100 00000101 00000000 00000001
Which is decoded as:
WGR
G B
WYO
An advantage of using this structure is that the rolq and rorq bit-wise operators can be used to move a face. Rolling by 16 bits effects a 90-degree rotation; rolling by 32 bits gives a 180-degree turn. The adjacent pieces need to be up-kept manually--i.e. after rotating the top face, the top layer of the front, left, back, and right faces need to be moved, too. Turning faces in this manner is really fast. For example, rolling
00000000 00000001 00000010 00000011 00000100 00000101 00000000 00000001
by 16 bits yields
00000000 00000001 00000000 00000001 00000010 00000011 00000100 00000101
Decoded, that looks like this:
WGW
Y G
OBR
Another advantage is that comparing cube states can in some instances be done using some clever bit masks and standard integer comparisons. That can be a pretty big speed-up for a solver.
Anyway, my implementation is on github: https://github.com/benbotto/rubiks-cube-cracker/tree/2.2.0 See Model/RubiksCubeModel.{h,cpp}.
Expanding on method (4) above, some of the algorithms for programmatically solving the Rubik's Cube use an iterative deepening depth-first search with A*, using pattern databases as a heuristic. For example, Korf's algorithm utilizes three pattern databases: one stores the index and orientation of the 8 corner cubies; one stores the index and orientation of 6 of the 12 edge pieces; the last stores the index and orientation of the other 6 edges. When using pattern databases, a fast approach is to store the cube as a set of indexes and orientations.
Arbitrarily defining a convention, the edge cubies could be indexed as follows.
0 1 2 3 4 5 6 7 8 9 10 11 // Index.
UB UR UF UL FR FL BL BR DF DL DB DR // Position (up-back, ..., down-right).
RY RG RW RB WG WB YB YG OW OB OY OG // Colors (red-yellow, ..., orange-green).
So the red-yellow edge cubie is at index 0, and the white-green edge cubie is at index 4. Likewise, the corner cubies might be indexed like so:
0 1 2 3 4 5 6 7
ULB URB URF ULF DLF DLB DRB DRF
RBY RGY RGW RBW OBW OBY OGY OGW
So the red-blue-yellow corner cubie is at index 0, and the orange-green-yellow corner cubie is at index 6.
The orientation of each cubie needs to be kept as well. An edge piece can be in one of two orientations (oriented or flipped), while a corner piece can be in three different orientations (oriented, rotated once, or rotated twice). More details about the orientation of pieces can be found here: http://cube.rider.biz/zz.php?p=eoline#eo_detection With this model, rotating a face means updating indexes and orientations. This representation is the most difficult because it's hard for a human (for me at least) to look at a big blob of index and orientation numbers and verify their correctness. That being said, this model is significantly faster than dynamically calculating indexes and orientations using one of the other models described above, and so it's the best choice when using pattern databases. You can see an implementation of this model here: https://github.com/benbotto/rubiks-cube-cracker/tree/4.0.0/Model (see RubiksCubeIndexModel.{h,cpp}).
As mentioned, the program also renders the cube. I used a different structure for that part. I defined a "cubie" class, which is six squares with 1, 2, or 3 colored faces for center, edge, and corner pieces, respectively. The Rubik's Cube is then composed of 26 cubies. The faces are rotated using quaternions. The code for the cubies and cube is here: https://github.com/benbotto/rubiks-cube-cracker/tree/4.0.0/Model/WorldObject
If you're interested in my Rubik's Cube solver program, there's a high-level overview video on YouTube: https://www.youtube.com/watch?v=ZtlMkzix7Bw&feature=youtu.be I also have a more extensive write-up on solving the Rubik's Cube programmatically on Medium.

One way would be to focus on the visual appearance.
A cube has six faces and each face is a three-by-three array of squares. So
Color[][][] rubik = new Color[6][3][3];
Then each move is a method that permutes a specific set of colored squares.

Eschew optimisation; make it object-oriented. A pseudocode class outline I've used is:
class Square
+ name : string
+ accronym : string
class Row
+ left_square : square
+ center_square : square
+ right_square : square
class Face
+ top_row : list of 3 square
+ center_row : list of 3 square
+ bottom_row : list of 3 square
+ rotate(counter_clockwise : boolean) : nothing
class Cube
+ back_face : face
+ left_face : face
+ top_face : face
+ right_face : face
+ front_face : face
+ bottom_face : face
- rotate_face(cube_face : face, counter_clockwise : boolean) : nothing
The amount of memory used is so small and processing so minimal that optimisation is totally unnecessary, especially when you sacrifice code usability.

An interesting method to represent the cube is used by the software "Cube Explorer". Using a lot of clever maths that method can represent the cube using only 5 integers. The author explains the maths behind his program on his website. According to the author the representation is suited to implement fast solvers.

There are many ways to do this. Some ways are make more efficient use of memory than others.
I have seen people use a 3 x 3 x 3 array of cuboid objects, where the cuboid object needs to store color information (and yes, that center object is never used). I have seen people use 6 arrays, each of which is a 3 x 3 array of cuboids. I have seen a 3 x 18 array of cuboids. There are many possibilities.
Probably a bigger concern is how to represent the various transforms. Rotating a single face of a physical cube (all cube moves are essentially rotations of a single face) would have to be represented by swapping around a lot of cuboid objects.
Your choice should be one that makes sense for whatever application you are writing. It may be that you are only rendering the cube. It may be that there is no UI. You may be solving the cube.
I would choose the 3 x 18 array.

There are 20 cubies that matter. So one way to do it is as an array of 20 strings. The strings would hold 2 or 3 characters indicating the colors. Any single move affects 7 of the cubies. So you just need a remapper for each of the six sides.
Note: This solution doesn't manage to remember the orientation of the logo sticker that's on the white center.
By the way, I helped someone do a software Rubik's cube once, maybe 15 years ago, but I can't remember how we represented it.

You could imagine the cube as three vertical circular linked lists, which intersect three horizontal linked lists.
Whenever a certain row of the cube is rotated you would just rotate the corresponding pointers.
It would look like this:
struct cubeLinkedListNode {
cubedLinkedListNode* nextVertical;
cubedLinkedListNode* lastVertical;
cubedLinkedListNode* nextHorizontal;
cubedLinkedListNode* lastHorizontal;
enum color;
}
You might not actually need the 2 'last'-pointers.
[ I did this with C, but it could be done in Java or C# just using a simple class for cubeLinkedListNode, with each class holding references to other nodes. ]
Remember there are six interlocking circular linked lists. 3 vertical 3 horizontal.
For each rotation you would just loop through the corresponding circular linked list sequentially shifting the links of the rotating circle, as well as the connecting circles.
Something like that, at least...

The shortest representation is something like this one: codepen.io/Omelyan/pen/BKmedK
The cube is unwrapped in 1D array (vector of 54 elements). A few-line rotation function swaps stickers and based on the cube's symmetry. Here's complete working model in C, I made it in 2007 when was a student:
const byte // symmetry
M[] = {2,4,3,5},
I[] = {2,0,4,6};
byte cube[55]; // 0,0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1, ... need to be filled first
#define m9(f, m) (m6(f, m)*9)
byte m6(byte f, byte m) {return ((f&~1)+M[m+(f&1)*(3-2*m)])%6;}
void swap(byte a, byte b, byte n) {
while (n--) {byte t=cube[a+n]; cube[a+n]=cube[b+n]; cube[b+n]=t;}
}
void rotate(byte f, byte a) { // where f is face, and a is number of 90 degree turns
int c=m9(f, 3), i;
swap(c, c+8, 1);
while (a--%4) for (i=2; i>=0; --i)
swap(m9(f, i) + I[i], m9(f, i+1) + I[i+1], 3),
swap(f*9+i*2, f*9+i*2+2, 2);
swap(c, c+8, 1);
}

A rubik cube has:
8 corners each containing a unique corner cubelet.
12 edges each containing a unique edge cubelet.
6 centres each containing a unique centre cubelet.
Each corner cubelet can be in one of 3 orientations:
not rotated;
rotated clockwise 120°; or
rotated anti-clockwise 120°.
Each edge cubelet can be in one of 2 orientations:
not flipped; or
flipped 180°.
The centre cubelets are fixed relative to each other; however there are 24 possible orientations (ignoring rotations of individual centres, which is only relevant if you are solving a picture cube) as there are 6 ways to pick the centre cubelet that is on the "up" face of the cube and then 4 ways to pick the centre cubelet that would be on the "front" face.
You can store this as:
an array of eight 3-bit integers each representing the corner cubelet in a corner position.
an array of eight 2-bit integers each representing the orientation of the corner cubelet in a corner position.
an array of twelve 4-bit integers each representing the edge cubelet in an edge position.
an array of twelve 1-bit integers each representing the orientation of the edge cubelet an edge position.
an 5-bit integer representing an enumeration of all 24 possible orientations of the centre cubelets.
This gives a total of 105 bits (14 bytes).
Space optimisations:
Since the corners are always fixed then you can assume that they never move and do not need to be stored. With this, if you want to do an E move then do an equivalent U D' pair of moves instead.
This would reduce the size to 100 bits (13 bytes).
If you restrict the representation to solvable cubes then it is possible to store the cube in a smaller space as:
Once you know 7 corner cubelets you can work out what the 8th is.
The orientation of the corner cubelets has a fixed parity so once you know the orientation of 7 corners you can derive the 8th.
Similar for the edges, you only need to store 11-of-12 edge cubelets and edge orientations and can calculate the remaining one.
This saves a further 10 bits for a total of 90 bits (12 bytes). However, the calculations required to work out the missing information may mean that this space optimisation is not worth the performance penalty.
More Space Optimisations:
If you really want to optimise the space for the cube the:
the 8 corner cubelets can be arranged in 8! = 40320 permutations and 40320 can be represented in 16 bits.
7 ternary (base-3) digits can represent the orientation of the corners (deriving the position of the 8th) and 3^7 = 2187 and can be represented in 12 bits.
the 12 edge cubelets can be arranged in 12! = 479001600 permutations and 479001600 can be represented in 29 bits.
11 binary digits can represent the orientation of the edges (deriving the position of the 12th) which would be 11 bits.
This gives a total of 68 bits (9 bytes).
The maximum number of permutations of a solvable rubik cube is (8!*3^8*12!*2^12)/12 = 43,252,003,274,489,856,000 ~= 4.3*10^19 which can be stored in 66 bits (9 bytes) and while its possible to enumerate all the possible solutions it is not worth it to save those last 2 bits.

The others well addressed describing the physical cube, but regarding the state of the cube... I would try using an array of vector transformations to describe the changes of the cube. That way you could keep the history of the rubiks cube as changes are made. And I wonder if you could multiply the vectors into a transformation matrix to find the simplest solution?

As a permutation of the 48 faces which can move. The basic rotations are also permutations, and permutations can be composed, they form a group.
In a program such a permutation would be represented by an array of 48 elements containing numbers 0 to 47. The colors corresponding to the numbers are fixed, so a visual representation can be computed from the permutation, and vice versa.

I found it very useful to store the state as a mapping from the coordinates of the center of the cubelet outside the face to the color. So, for example, the upper face of the front-top-right cubelet is state[1, -1, 2]. So moves are done by just applying rotations to the indices.
You can see a simple simulator I wrote using it here: https://github.com/noamraph/cube

Related

Marching Squares Algorithm's bit shifting step

I am currently implementing the Marching Squares for calculating contour curves and I had a question regarding the usage of bit shifting as mentioned here
Compose the 4 bits at the corners of the cell to build a binary index: walk around the cell in a clockwise direction appending the bit to the index, using bitwise OR and left-shift, from most significant bit at the top left, to least significant bit at the bottom left. The resulting 4-bit index can have 16 possible values in the range 0–15.
I have height data that I place on corners of each vertex at a specified (x,y). Then I convert this height data into 0s and 1s by performing a check to see whether this height data is greater or lesser than a specified isovalue(say contour level). Now the vertexes are either 0s or 1s. What is the purpose of the next step i.e. calculating the 4-bit index by traversing in a clockwise direction ?
The purpose of composing a four bit code is simply to identify the configuration, one among sixteen cases.
You could very well use four nested if/then/else statements instead.

Reconstruct meshes of 1, 2 or 3-connected surfaces from sparse point clouds

I've got some point clouds – i.e. collection of 3d coordinates – whose underlying structures that I'd like to reconstruct, even just approximately, are 2D surfaces in the 3D space with either 0 (fig B), 1 (fig A) or 2 holes, or the disjoint union of two or more such structures (fig C).
Each surface is not a closed boundary to a 3D domain (my structures are 2 dimensional). Mathematically speaking, they are genus 0 2-manifolds with 1,2 or 3 boundary components or a disjoint union of such structures.
↑ "out" should be actually made of triangles; actual point clouds are not so obviously coming from spheres
[ As an example, think of the set of points defined as "sampled points on the surface of a mountain range whose distance from a air balloon is between 100 and 150 m". These points represent a surface that can either have 0 holes (the top of the mountain), 1 hole (the flank), 2 holes (a valley which is a basis for 2 mountains) ]
My expected output is a set of edges and triangular faces connecting the existing vertices. I'll use it for rough surface area estimates and for a graphic representation of my dataset. I don't need the resulting surface to be perfect: even topological artifacts (holes that shouldn't be there) can be acceptable if they are limited in number.
My point cloud has some features:
the density of point is somewhat constant, and if a set is made of non connected surfaces I should be able to tell them apart using a threshold
sometimes a "strip" of surface can narrow down to a single file of points, or even skip some points (see example C, left component)
I know there cannot be an exact reconstruction algorithm, but maybe there are canonical ways to minimise the total length of the edges, or the total surface area, or the crease angle between neighbouring faces: I'm not sure as I'm not from the computational geometry field.
If that helps, I'll be implementing this in Java, but it's the algorithm I'm interested in. I don't mind if the algorithm relies on parameters as long as it doesn't make assumptions that don't hold in my case.
Thanks!
When density is close to constant, i.e. space between points is similar, two methods are useful for triangulations: "Marching Cubes" and "Ball Pivot"

Contour of a run-length-coded digital shape

A digital shape is a set of connected pixels in a binary image (a blob).
It can be compactly represented by run-length coding, i.e. grouping the pixels in horizontal line segments and storing the starting endpoint coordinates and the lengths. Usually, the RLC representation stores the runs in raster order, i.e. row by row and let to right.
For smooth shapes, the storage requirement drops from O(N²) to O(N).
The outline of a shape is a closed chain of pixels which restores the shape when its interior is filled (by a flood filling algorithm). It is also an O(N) representation. Wen the shape is available as a bitmap, the outline can be obtained by a contouring algorithm.
I am looking for an algorithm that directly computes the outline of a shape given its RLC representation, without drawing it in an intermediate bitmap. The algorithm is expected to run in time linear in the number of runs.
Have you come across a solution ?
A pixel is a boundary pixel if it is filled but adjacent to a pixel that is not filled. Given a per-row RLE encoding of the filled pixels, we can operate on three adjacent rows to compute a RLE version of the boundary pixels, then decode it.
Basically, we have a sweep line algorithm. With three rows like
*********** ****
************************
**** ******
we get event points (^) from the RLE:
*********** ****
************************
**** ******
^ ^^ ^ ^ ^ ^^ ^
The first thing to do is to designate middle filled pixels that have empty pixels above or below as boundary. (If you need guidance, the algorithms for set difference on a list of intervals are very similar.)
*********** ****
BBB***BBBBBBBBBBB***BBBB
**** ******
Then, for the intervals that are filled but not known to be boundaries, check whether left endpoint has space to the left and whether the right endpoint has space to the right. If so (respectively), they're boundaries.
Note: This answer assumes that "non-outline" means "surrounded by 4 neighbours", so the result will be slightly different to your example (1 pixel green instead of blue).
All outline pixels are pixels where not all of the 4 "neighbour pixels" (left, right, above, below of the pixel) are set.
When decoding the RLC from top to bottom, you can get the outline pixels with the following pseudo code algorithm:
For the first line
All decoded pixels are outline pixels
For the subsequent lines
Leftmost and rightmost pixels of each RLC run are outline pixels
All other pixels are outline pixels if:
The pixel above isn't set (case A)
The pixel below isn't set (case B)
Case A and B mean that you'll have to look at pixels above/below the current pixel, so the algorithm should actually be kind of pipelined/looking ahead one line, because case B will not be able to be detected until the next line was decoded.
EDIT: To sort the pixels in clockwise order afterwards, you can use the fact that your outline is a diagonally connected one-pixel-width line. Picking one of the pixels in the topmost line, you'll have two possible next pixels, follow the one that is right of, below or right and below the current pixel. After that, just follow the neighbour pixels that you haven't visited yet until there is no neighbour pixel. Example:
/----- First pixel you pick, A and B are neighbour candidates, A is the "correct" one
v
xAxxx
B x
x x xxx
x xxxxxx x
xx x
xxxxxxxxxxx
s0123 Result after following the neighbours (s = start, e = end),
e 4 numbers from 0-9 show order of traversal
1 5 234
0 678901 5
98 6
76543210987
Hint:
As said in other answers, emitting the list of the outline pixels can be implemented as a sweepline process, during which the 3x3 neighborhoods of the run endpoints are examined.
This procedure will emit the pixels in a scrambled way, as a sequence of direct and reverse arcs that need to be stored and reordered.
An alternative could be based on the idea of implementing the standard Moore Neighborhood algorithm that has the advantage to enumerate the outline pixels in the desired order.
This procedure requires to know the 8-neighborhood configuration around the current pixel, and the idea is to update this neighborhood on every move to another pixel: we maintain indexes to the run that contains the current pixel and to the two facing runs in the rows above and below.
On every move to another pixel, we need to update these three indexes, which will involve short sequential searches in the list of sorted runs. This can be seen as a pseudo-random access mechanism to pixels, taking into account that the successive accesses are strongly local and can be sort-of cached.
Update:
In the run-length-coded representation that I use, only the black runs are coded, as triples (X, Y, L). The runs are sorted by rows top to bottom, and then left to right in a row.
For convenience, we can switch to a "linear adressing" scheme, as if all image rows had been appended after each other, and every pixel is designated by a single number Z = X + Y.Nx (where Nx is the image width).
So we have a list of black runs, and the white runs are implicitly found between two consecutive black ones.
During processing, we can remember at all times the index of the run that starts immediately before or on the current pixel (R[I].Z <= Z < R[I+1].Z). We can tell the color of the pixel by checking if we are inside the run or between it and the next (Z < R[I].Z + R[I].L).
If we move one position to the left, Z decreases by 1 and we may have to select the previous run (I--).
If we move one position up, Z decreases by Nx and we may have to backtrack by several runs (I-- until R[I].Z <= Z again).
The picture shows the current pixel and its 4-neighbors, as well as the"influence zones" of the black runs.
We can handle all eight displacement directions similarly.
As we see, every move takes a number of operations at worse equal to the number of runs in a row, deemed to be a small value. Using this concept, we can traverse the RLC representation following an arbitrary path at a reasonable cost, without reconstructing the whole bitmap.
As the Moore Neighborhood algorithm takes time linear in the length of the outline, an implementation based on this linear run addressing will also take linear time (for a bounded number of runs per row).

Partial sphere representation with hexes using graphs

Problem: Spheres cannot be tesselated using only hexagonal tiles.
Goal: Create a globe map, made of discrete hexagonal fields.
Requirements:
a. Graphic representation of a globe/sphere/planet.
b. Tesselate area into hexes.
c. Hexes may contain something.
d. The number of tiles should be between 1 000 and 10 000.
e. A reasonable amount of inaccuracy is okay.
Idea:
Create an undirected graph which will represent the hexes. Because hexes must always have exactly 6 neighbors, the graph needs to be 6-regular and contain 1 000 < N < 10 000 vertices and 3N edges (from the handshaking lemma). It can be stored as an adjacency matrix with pointers to vertex instances.
The vertex instances are populated with information. For instance, in a game, this might be units.
Visual representation: No screen can show a whole globe at once. So, to show a part of our globe, we first select the vertex that should be in the middle of the display and display it as a hex. Then, from the adjacency matrix, we pull up its immediate neighbors and position them around as hexes. For each of these neighbors, we pull up the next level of neighbors, and so on, until the screen is filled.
Questions:
I Is there an algorithm which can be used to construct a graph described in 1.?
II Can I prove that up to a selected neighbor depth, no conflicts will arise which will make a graphic representation impossible? Obviously, no conflicts will arise for a display depth of at least 1 + 6 hexes.
if I&II:
Do you think the described approach is promising enough to try and implement?
Nobody ever answered, but the answer is that this is impossible. The Euler characteristic of a finite graph covering the sphere has to be 2 (see http://en.wikipedia.org/wiki/Euler_characteristic for the Euler characteristic) and the Euler characteristic of a graph that is built out of hexagons is always 0.
You have to somewhere have shapes of a different size.

How can I inscribe a rectangle or circle inside an arbitrary quadrilateral

This may be a more math focused question, but wanted to ask here because it is in a CS context. I'm looking to inscribe a rectangle inside another (arbitrary) quad with the inscribed quad having the largest height and width possible. Since I think the algorithm will be similar, I'm looking to see if I can do this with a circle as well.
To be more clear hear is what I mean by the bounding quadrilateral as an example.
Here are 2 examples of the inscribed maximization I'm trying to achieve:
I have done some preliminary searching but have not found anything definitive. It seems that some form of dynamic programming could be the solution. It seems that this should be a linear optimization problem that should be more common than I have found, and perhaps I'm searching for the wrong terms.
Notes: For the inscribed square assume that we know a target w/h ratio we are looking for (e.g. 4:3). For the quad, assume that the sides will not cross and that it will be concave (if that simplifies the calculation).
1) Circle.
For a triangle, this is a standard math question from school program.
For quadrilateral, you can notice that maximum inner circle will touch at least three of its sides. So, take every combination of three sides and solve the problem for each triangle.
A case of parallel sides have to be considered separately (since they don't form a triangle), but it's not terribly difficult.
2) Rectangle.
You can't have "largest height and width", you need to choose another criteria. E.g., on your picture I can increase width by reducing height and vice versa.
4 year old thread, but I happened to stumble accross it when googling my problem.
I have a problem like this in a current CV application. I came up with a simple and somewhat clumsy solution for the finding the largest. Not exactly the same though, cause I maximize the area of the rectangle without a fixed ratio of sides.
I don't know yet wether my solutions finds the optimum or whether it works in all cases. I also think there should be a more efficient way, so I am looking forward to your input.
First, assume a set of 4 points forming our (convex) quadrilateral:
x y
P1 -2 -5
P2 1 7
P3 4 5
P4 3 -2
For this procedure the leftmost point is P1, the following points are numbered clockwise. It looks like this:
We then create the linear functions between the Points. For each function we have to know the slope k and the distance from 0: d.
k is simply the difference in Y of the two points divided by the difference in X.
d can be calculated by solving the linear function to d. So we have
k=dy/dx
d=y1-k*x1
We will also want the inverse functions.
k_inv = 1/k
d_inv = -d/k
We then create the function and inverse function for each side of the quadrilateral
k d k d
p1p2 4 3 p1p2_inv 0.25 -0.75
p2p3 -0.67 7.67 p2p3_inv -1.5 11.5
p3p4 7 -23 p3p4_inv 0.14 3.29
p4p1 0.6 -3.8 p4p1_inv 1.67 6.33
If we had completely horizontal or vertical lines we would end up with a DIV/0 in one of the functions or inverse functions, thus we would need to handle this case separately.
Now we go through all corners that are enclosed by two functions that have a k with a slope with a different sign. In our case that would be P2 and P3.
We start at P2 and iterate through the y values between P2 and the higher one of P1 and P3 with an appropriate step size and use the inverse functions to calculate the distance between the functions in horizontal direction. This would give us one side of the rectangle
a=p2p3_inv(y)-p1p2_inv(y)
At the two x values x = p2p3_inv(y) and x = p1p2_inv(y) we then calculate the difference in y to the two opposite functions and take the distance to our current y position as a candidate for the second side of our rectangle.
b_candidate_1 = y-p4p1(p2p3_inv(y))
b_candidate_2 = y-p4p1(p1p2_inv(y))
b_candidate_3 = y-P3p4(p2p3_inv(y))
b_candidate_4 = y-P3p4(p1p2_inv(y))
The lesser of the four parameters would be the solution for side b.
The area obviously becomes a*b.
I did a quick example in excel to demonstrate:
the minimum b here is 6.9, so the upper right corner of the solution is on p2p3 and the rectangle extends a in horizontal and b in vertical direction to the left and bottom respectively.
The four points of the rectangle are thus
Rect x y
R1 0.65 -1.3
R2 0.65 5.6
R3 3.1 5.6
R4 3.1 -1.3
I will have to put this into C++ code and will run a few tests to see if the solution generalizes or if this was just "luck".
I think it should also be possible to substitute a and b in A=a*b by the functions and put it into one linear formula that has to be maximized under the condition that p1p2 is only defined between P1 and P2 etc...

Resources