Merging angled text OCR into lines

Merging angled text OCR into lines - algorithm

I have a text box detection algorithm that outputs word-level detections. Here's an example:
So the output is a list of boxes in the form (x1_i,y1_i,x2_i,y2_i) indicating the bottom-left and top-right coordinates. I'd like to find a simple decent baseline algorithm to merge these boxes into lines. So the desired output would be:
["Hey how are you?" , "I'm great!"]
I've seen a few questions similar to this, but they are primarily about straight (uni-directional) text, e.g.:
Merge the Bounding boxes near by into one
My thoughts on this are to calculate vectors from the centroid of each box, and then doing merging of boxes, based on closeness and near-same direction. I'm wondering if there are any such algorithms already out there? The corner cases that I'd like to try and address are:
Multiple angles of text.
Non-overlapping boxes (like the [I'm] [great!] ones).
Crossing texts at different angles (like the two lines above).
I'd like to find a such a quick-and-easy baseline algorithm using python.

This is the simplest algorithm meeting your criteria that I can think of.
Define a function score(b1, b2) that reflects how likely it is that the word in b1 precedes the word in b2, with lower scores being better (e.g., 0 means that the right edge of b1 is the same as the left edge of b2). For each box b, define pred(b) to be the box b' minimizing score(b', b), and define succ(b) to be the box b' minimizing score(b, b'). Form a graph on the bounding boxes with arcs b->b' such that succ(b) = b' and pred(b') = b. This graph is the disjoint union of some number of simple paths. These are your lines.
I don't know the best way to define score, but one possibility is to define score(b1, b2) = f(distance(midpoint of b1's right side, midpoint of b2's right side) / max(b1's height, b2's height)) + g(angular distance(b1's angle, b2's angle)), where f and g are increasing functions (I'd start the experiments with f and g being linear).
Edit: it occurs that the word bounding box algorithm may only give you a reliable orientation modulo a half rotation. In this case, define score so that it's lower when the two boxes are likelier to precede/succeed one another, form a directed graph where each box has an arc to each of the two boxes with which it has the lowest scores, throw away arcs with no reciprocal arc, form an undirected graph with an edge for each arc pair. The latter graph consists of a disjoint union of paths and cycles. You'll have to figure out what to do with the cycles and how to orient the paths.

Related

Finding intersection between sets of points describing a curve

Say I have two sets of points
p1, p2, p3,... ,pn
and
q1, q2, q3,..., qn
which describe two paths (curves) in a plane. The points may not be evenly sampled from the curve, but they are "in order" (with regard to parameterizations of the curves). What is a good way to find out where these two curves intersect?
So for example, I may have just two points each
(0,0) (1,0)
and
(-5,1) (-4,-1)
in which case their intersection is (-4.5,0).
The most rudimentary way to do this would be to draw the edges between every two points, extend them, and see whether any two pairs of edges intersect in a suitable patch of land. I'm curious if there's a better way.

The most efficient way to find such intersection is by means of sweepline algorithms, that can achieve O(n log n + k) running time (n line segments having k intersections), better than the O(n²) by exhaustive comparisons. See http://www.ti.inf.ethz.ch/ew/lehre/CG09/materials/v9.pdf. Unfortunately, such solutions are rather sophisticated.
A possible alternative, much simpler to implement, is to use hierarchichal bounding: take the bounding box of every segment, merge the boxes two by two (consecutive segments), then four by four and so on. starting from N segments, you'll form hierarchy of N-1 bounding boxes.
Then, to intersect two curves, check interference of their top-level bounding boxes. If the do overlap, check interference of the sub-boxes, and so on recursively.
Unless your curves are closely intertwined, you can spare a large number of segment comparisons.

You can preprocess each polyline (chain of segments) and find a minimal bounding rectangle for each of them. Also you can build a hierarchical data structure for each polyline - a rectangle for the whole one, then a rectangle for each half and so on. You can use other geometrical forms instead of rectangle - circle or ellipse, for instance.
Then you can use Clipping and Culling to accelerate intersections search.

You can calculate bounding box around a set of points, say every 100 pair of points and intersect only those in a n x n manner. Bounding box intersections can be done very efficiently. If two bounding boxes (one from each curve) intersect, you can test for intersection just the edges involved inside of those boxes.
This will handle the case when there's more than one intersection between the curves. Just mind the boundary cases, when the point of intersection is actually one of the vertices defining an edge.

How can I pick a set of vertices to subtract from a polygon such that the distortion is minimum?

I'm working with a really slow renderer, and I need to approximate polygons so that they look almost the same when confined to a screen area containing very few pixels. That is, I'd need an algorithm to go through a polygon and subtract/move a bunch of vertices until the end polygon has a good combination of shape preservation and economy of vertice usage.
I don't know if there's a formal name for these kind of problems, but if anyone knows what it is it would help me get started with my research.
My untested plan is to remove the vertices that change the polygon area the least, and protect the vertices that touch the bounding box from removal, until the difference in area from the original polygon to the proposed approximate one exceeds a tolerance I specify.
This would all be done only once, not in real time.
Any other ideas?
Thanks!

You're thinking about the problem in a slightly off way. If your goal is to reduce the number of vertices with a minimum of distortion, you should be defining your distortion in terms of those same vertices, which define the shape. There's a very simple solution here, which I believe would solve your problem:
Calculate distance between adjacent vertices
Choose a tolerance between vertices, below which the vertices are resolved into a single vertex
Replace all pairs of vertices with distances lower than your cutoff with a single vertex halfway between the two.
Repeat until no vertices are removed.
Since your area is ultimately decided by the vertex placement, this method preserves shape and minimizes shape distortion. The one drawback is that distance between vertices might be slightly less intuitive than polygon area, but the two are proportional. If you really wish, you could run through the change in area that would result from vertex removal, but that's a lot more work for questionable benefit imo.
As mentioned by Angus, if you want a direct solution for the change in area, it's not actually super difficult. Was originally going to leave this as an exercise to the reader, but it's totally possible to solve this exactly, though you need to include vertices on either side.
Assume you're looking at a window of vertices [A, B, C, D] that are connected in that order. In this example we're determining the "cost" of combining B and C.
Calculate the angle offset from collinearity from A toward C. Basically you just want to see how far from collinear the two points are. This is |sin(|arctan(B - A)| - |arctan(C - A)|)| Where pipes are absolute value, and differences are the sensical notion of difference.
Calculate the total distance over which the angle change will effectively be applied, this is just the euclidean distance from A to B times the euclidean distance from B to C.
Multiply the terms from 2 and 3 to get your first term
To get your second term, repeat steps 2 - 4 replacing A with D, B with C, and C with B (just going in the opposite direction)
Calculate the geometric mean of the two terms obtained.
The number that results in step 6 presents the full-picture minus a couple constants.

I tried my own plan first: Protect the vertices touching the bounding box, then remove the rest in the order that changes the resultant area the least, until you can't find a vertice to remove that keeps the new polygon area within X% of the original one. This is the result with X = 5%:
When the user zooms out really far these shapes fit the bill well enough for me. I haven't tried any of the other suggestions. The savings are quite astonishing, sometimes from 80-100 vertices down to 4 or 5.

How to check if a box fits into another box (any rotations allowed)

Suppose I have two boxes (each of them is a rectangular cuboid, aka rectangular parallelepiped). I need to write a function that decides if the box with dimensions (a, b, c) can fit into the box with dimensions (A, B, C), assuming any rotations by any angles are allowed (not only by 90°).
The tricky part is the edges of the inner box may be not parallel to corresponding edges of the outer one. For example, a box that is very thin in the dimensions (a, b) but with length 1 < c < √3 can fit into a unit cube (1, 1, 1) if placed along its main diagonal.
I've seen questions [1], [2] but they seem to cover only rotations by 90°.

Not a complete answer, but a good start is to determine the maximum diameter that fits inside the larger box (inscribe the box in a circle) and the minimum diameter needed for the smaller box. That gives a first filter for possibility. This also tells you how to orient the smaller box within the larger one.

If one box can fit inside the other than it can fit also if boxes have same center. So only rotation is enough to check, translation is not needed to check.
2D case: For boxes X=(2A,2B) and x=(2a,2b) positioned around (0,0). That means corners of X are (+-A, +-B).
---------(A,B)
|
-----------(a,b)
(0,0) |
-----------(a,-b)
|
---------(A,-B)
Be rotating x around (0,0), corners are always on circle C with radius sqrt(a^2+b^2). If part of circle lie inside box X, and if part inside X has enough arc length to place 2 points on distance 2a or 2b, than x can fit inside X. For that we need to calculate intersection of C with lines x=A and y=B, and calculate distance between these intersection. If distance is equal or grater than 2a or 2b, than x can fit inside X.
3D case: Similar. For boxes X=(2A,2B,2C) and x=(2a,2b,2c) positioned around (0,0,0). Rotating x around (0,0,0), all corners move on sphere with radius sqrt(a^2+b^2+c^2). To see is there enough space on sphere-box intersection part, find intersection of sphere with planes x=A, y=B and z=C, and check is there enough space to fit any of quads (2a,2b), (2a,2c) or (2b,2c) on that sphere part. It is enough to check are points on part border on sufficient distance. For this part I'm not sure about efficient approach, maybe finding 'center' of intersection part and checking it's distance to border can help.

You basically have to check several cases, some trivial and some needing minimization searches.
First, there are 4 3-parallel-axis cases. If any of them passes (with #jean's test), you fit. Otherwise, continue to the next test cases:
Next, there are 18 2d diagonal cases, where one axis is parallel and the other two are diagonal, with one angle degree of freedom. Discard a case if the parallel axis doesn't fit; otherwise find the minimum of some "impingement" metric function of the single rotation angle. Then check for any actual impingement at that angle. The impingement metric has to be some continuous measure of how the inner box (4 corners) are staying inside the 2 faces of the outer box, allowing that sometimes they may go outside during the search for the minimum impingement angle. Hopefully a) there are a predictable maximum number of minima, and hopefully b) if there is a possible fit, then a fit is guaranteed at the angle of one of these minima.
If none of those cases passes without impingement, then move on to the larger number of 3d no-parallel-axes cases, where the rotation parameter is now three angles instead of one, and you have to search for a (hopefully limited number of) minima of the impingement metric, and test there for actual impingement.
Not really elegant, I think. This is similar to another thread asking how long of a line of given width can fit inside a 2d box of given dimensions. I didn't consider the parallel-axis case there, but the diagonal case requires solving a quartic equation (much worse than a quadratic equation). You may have a similar problem for your one-parallel-axis cases, if you want to go analytic instead of searching for minima of an impingement metric. The analytic solution for the no-parallel-axis 3d diagonal cases probably involves solving (for the correct root of) an even higher order equation.

In fact, any box A with dimensions (a1, a2, a3) can fit in an other box B with dimensions (b1, b2, b3), in the following conditions:
i) Every ai is less than or equal to every bi with i = 1. 2. 3;
ii) Any ai has to be less than or equal to sqrt(b1^2+b2^2+b3^2), the main diagonal of B (diagB). Any box A with one of its dimensions equal to diagB, has the other two dimensions equal to 0, since any plane orthogonal to it would extend outside the box B.
iii) The sum of a1, a2 and a3 must be less than or equal to diagB.
From these, we can see that the greatest dimension ai of a box A for it to fit box B, given ai > bi, should lie in the interval (bi, diagB).
Thus, any box with one dimension bigger than any dimension of a box containing it will necessarily placed along the latter's main diagonal.
Put it simply:
A(a1, a2, a3) fits in B(b1, b2, b3) iff a1+a2+a3 <= diagB.

Can you get box dimensions? Say a0,a1,a2 are the dimensions of box A ordered by size and b0,b1,b2 are the dimensions of box B ordered by size.
A fits inside B if (a0 <= b0 AND a1 <= b1 AND a2 <= b2)

Four-way navigation algorithm

Consider a rectangular shaped canvas, containing rectangles of random sizes and positions. To navigate between these rectangles, a user can use four arrows: up, down, left, right.
Are you familiar with any algorithm of navigation that would produce a fairly straightforward user experience?
I came across a few solutions but none of them seemed suitable. I am aware that no solution will be "ideal". However, the kind of algorithm I am looking for is the sort used to navigate between icons on a desktop using only the arrow keys.

[EDIT 21/5/2013: As pointed out by Gene in a comment, my weighting scheme actually does not guarantee that every rectangle will be reachable from every other rectangle -- only that every rectangle will be connected to some other rectangle in each direction.]
A nice way to do this is using maximum weighted bipartite matching.
What we want to do is build a table defining a function f(r, d) that returns the rectangle that the user will be moved to if they are currently at rectangle r and hit direction d (up, down, left or right). We would like this function to have some nice properties, such as:
It must be possible to reach every rectangle from every other rectangle
Pressing left then right or vice versa, or up then down or vice versa, should leave the user in the same place
Pressing e.g. left should take the user to a rectangle to the left (this is a bit more difficult to state precisely, but we can use a scoring system to measure the quality)
For each rectangle, create 4 vertices in a graph: one for each possible key that could be pressed while at that rectangle. For a particular rectangle r, call them rU, rD, rL and rR. For every pair of rectangles r and s, create 4 edges:
(rU, sD)
(rD, sU)
(rL, sR)
(rR, sL)
This graph has 2 connected components: one contains all U and D vertices, and the other contains all L and R vertices. Each component is bipartite, because e.g. no U vertex is ever connected to another U vertex. We could in fact run maximum weighted bipartite matching on each component separately, although it's easier just to talk about running it once on the entire graph after grouping, say, U vertices with L vertices and D vertices with R vertices.
Assign each of these edges a nonnegative weight according to how much sense it makes for that pair of rectangles to be connected by that pair of keys. You are free to choose the form for this scoring function, but it should probably be:
inversely proportional to the distances between the rectangles (you could use the distance between their centres), and
inversely proportional to how far the angle between the centres of the rectangles differs from the desired horizontal or vertical line, and
zero whenever the rectangles are oriented the wrong way (e.g. if for the edge (rU, sD) if the centre of r is actually above the centre of s). Alternatively, you can just delete these zero-weight edges.
This function attempts to satisfy requirement 3 at the top.
[EDIT #2 24/5/2013: Added an example function below.]
Here is C-ish pseudocode for an example function satisfying these properties. It takes the centre points of 2 rectangles and the direction from rectangle 1 (the direction from rectangle 2 is always the opposite of this direction):
const double MAXDISTSQUARED = /* The maximum possible squared distance */;
const double Z = /* A +ve number. Z > 1 => distance more important than angle */
// Return a weight in the range [0, 1], with higher indicating a better fit.
double getWeight(enum direction d, int x1, int y1, int x2, int y2) {
if (d == LEFT && x1 < x2 ||
d == RIGHT && x1 > x2 ||
d == UP && y1 < y2 ||
d == DOWN && y1 > y2) return 0;
// Don't need to take sqrt(); in fact it's probably better not to
double distSquared = (x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2);
double angle = abs(atan2(x1 - x2, y1 - y2)); // 0 => horiz; PI/2 => vert
if (d == UP || d == DOWN) angle = PI / 2 - angle;
return 1 - pow(distSquared / MAXDISTSQUARED, Z) * (2 * angle / PI);
}
Now run maximum weighted bipartite matching. This will attempt to find the set of edges having highest total weight such that every vertex (or at least as many as possible) are adjacent to a selected edge, but no vertex is adjacent to more than one edge. (If we allowed a vertex to be adjacent to more than one edge, it would mean that pressing that key while at that rectangle would take you to more than one destination rectangle, which doesn't make sense.) Each edge in this matching corresponds to a bidirectional pair of keypresses, so that pressing e.g. up and then down will take to back to where you were, automatically satisfying requirement 2 at the top.
The only requirement not automatically satisfied by this approach so far is the important one, number 1: it does not necessarily guarantee that every rectangle will be reachable. If we just use the "raw" quality scores as edge weights, then this can actually occur for certain configurations, e.g. when there is one rectangle in each of the 4 corners of the screen, plus one at the centre, the centre one might be unreachable.
[EDIT 21/5/2013: As Gene says, my claim below that property 1 is satisfied by the new weighting scheme I propose is wrong. In many cases every rectangle will be reachable, but in general, you need to solve the NP-hard Hamiltonian Cycle problem to guarantee this. I'll leave the explanation in as it gets us some of the way there. In any case it can be hacked around by adjusting weights between connected components upward whenever subcycles are detected.]
In order to guarantee that the matching algorithm always returns a matching in which every rectangle is reachable, we need to adjust the edge weights so that it is never possible for a matching to score higher than a matching with more edges. This can be achieved by scaling the scoring function to between 0 and 1, and adding the number of rectangles, n, to each edge's weight. This works because a full matching then has score at least 4n^2 (i.e. even if the quality score is 0, the edge itself has a weight of n and there are 4n of them), while any matching with fewer edges has score at most 4(n-1)(n+1) = 4n^2 - 4, which is strictly less.

It's a truism that to a person with a hammer everything looks like a nail. Shortest path algorithms are an obvious tool here because shortest distance seems intuitive.
However we are designing a UI where logical distance is much more important than physical distance.
So let's try thinking differently.
One constraint is that repeatedly hitting the up (right, down or left) arrow ought to eventually cycle through all the rectangles. Otherwise some unreachable "orphans" are likely. Achieving this with an algorithm based on physical (2d) distance is difficult because the closest item in 2d might be in the wrong direction in the 1d projection corresponding to the arrow pair being used. I.e. hitting the up arrow could easily select a box below the current. Ouch.
So let's adopt an extremely simple solution. Just sort all the rectangles on the x-coordinate of their centroids. Hitting the right and left arrow cycles through rectangles in this order: right to the next highest x and left to the next lowest x. Wrap at the screen edges.
Also do the same with y-coordinates. Using up and down cycles in this order.
The key (pun intended) to success is adding dynamic information to the screen while cycling to show the user the logic of what is occurring. Here is a proposal. Others are possible.
At first vertical (up or down) key, a pale translucent overlay appears over the rectangles. They are shaded pale red or blue in a pattern that alternates by y coordinate of centroid. There are also horizontal hash marks of matching color across the entire window. The only reason for two colors is to provide a visual indicator of correspondence between lines and rectangles. The currently selected rectangle is non-translucent and the hash mark is brighter than all the others. When you continue to hit the up or down key, the highlighted box changes in the centroid y-order as described above. The overlay disappears when no arrow key has been struck for a half second or so.
A very similar overlay appears if a horizontal key is hit, only it's vertical hash marks and x-order.
As a user I'd really like this scheme. But YMMV.
The algorithm and data structures needed to implement this are obvious, trivial, and scale very well. The effort will go into making the overlays look good.
NB Now that I have done all the drawings I realize it would be a good idea to place a correctly colored dot at the centroid of each box to show which of the lines is intersecting it. Some illustrative diagrams follow.
Bare Boxes
Selection with up or down arrow in progress
Selection with left or right arrow in progress

What about building a movement graph as follows:
for any direction, try to go to the nearest rectangle, in the given direction, whose center point is the middle of the current rectangle's side.
try to eliminate loops, e.g. moving 'right' from A should try to yield a different rectangle than moving 'up-right' from A. For example in this drawing, the 'right' from green should be orange, even though pink would be the nearest mid-point
(Thanks to biziclop): if any rectangles aren't reachable in the graph, then re-map one of the adjoining rectangles to get to it, likely the one with the least error. Repeat until all rectangles are reachable (I think that algorithm would terminate...)
Then store the graph and only use that to navigate. You don't want to change the directions in the middle of the session.

This problem can be modeled as a graph problem and algorithm of navigation can be used as a shortest path routing.
Here is the modelling.
Each rectangle is a vertex in the graph. From each vertex (aka rectangle) you have four options - up, down, left, right. So, you can reach four different rectangles, i.e this vertex will have four neighbors and you add these edges to graph.
I am not sure if this is part of the problem -- "multiple rectangles can be reached from a rectangle using a particular action (e.g. up)". If not, the above modelling is good enough. If yes, then add all such vertices as the neighbor for this vertex. Therefore, you may not end up with a 4-regular graph. Otherwise, you will model your problem into a 4-regular graph.
Now, the question is how do you define your "navigation" algorithm. If you do not want to distinguish between your actions, i.e. if up, down, left, and right are all equal, then you can add weight of 1 to all edges.
If you decide to give a particular action more precedence than other, say up is better than the rest, then you can give weight for edges resulted from up movement as 1, and the remaining edges as 2. The idea is by assigning different weights you can distinguish between the edges you will travel.
If you decide that all up edges are not equal, i.e. the up distance between A and B, is shorter than the up distance between C and D, then you can accordingly assign weights to the edges during the graph construction process.
Here is the routing
Now how to find the route -- You can use dijkstra's algorithm to find a shortest path between a given pair of vertices. If you are interested in multiple shortest paths, you can use k-shortest path algorithm to find k shortest paths between a pair of node, and then pick your best path.
Note that the graph you end up with does not have to be a directed graph. If you prefer a directed graph, you can assign directions to the edges when you are constructing them. Otherwise you should be good using an undirected graph, as all you care is to use an edge to reach a vertex from another. Moreover, if rectangle A can be reached using up from rectangle B, then rectangle B can be reached using down from rectangle A. Thus, directions really do not matter, if you do not need them for other reasons. If you do not like the assumption I just made, then you need to construct a directed graph.
Hope this helps.

Algorithm to take the union of rectangles and to see if the union is still a rectangle

I have a problem in which I have to test whether the union of given set of rectangles forms
a rectangle or not. I don't have much experience solving computational geometry problems.
What my approach to the problem was that since I know the coordinates of all the rectangles, I can easily sort the points and then deduce the corner points of the largest rectangle possible. Then I could sweep a line and see if all the points on the line falls inside the rectangle. But, this approach is flawed and this would fail because the union may be in the form of a 'U'.
I would be a great help if you could push me in the right direction.

Your own version does not take into account that the edges of the rectangles can be non-parallel to each other. Therefore, there might not be "largest rectangle possible".
I would try this general approach:
1) Find the convex hull. You can find convex hull calculation algorithms here http://en.wikipedia.org/wiki/Convex_hull_algorithms.
2) Check if the convex hull is a rectangle. You can do this by looping through all the points on convex hull and checking if they all form 180 or 90 degree angles. If they do not, union is not a rectangle.
3) Go through all points on the convex hull. For each point check if the middle point between ThisPoint and NextPoint lies on the edge of any initially given rectangle.
If every middle point does, union is a rectangle.
If it does not, union is not a rectangle.
Complexity would be O(n log h) for finding convex hull, O(h) for the second part and O(h*n) for third part, where h is number of points on the convex hull.
Edit:
If the goal is to check if the resulting object is a filled rectangle, not only edges and corners rectangle then add step (4).
4) Find all line segments that are formed by intersecting or touching rectangles. Note - by definition all of these line segments are segments of edges of given rectangles. If a rectangle does not touch/intersect other rectangles, the line segments are it's edges.
For each line segment check if it's middle point is
On the edge of the convex hull
Inside one of given rectangles
On the edge of two non-overlapping given rectangles.
If at least one of these is true for every line segment, resulting object is a filled rectangle.

You could deduce the he corner points of the largest rectangle possible, and then go over all the rectangle that share the border with the largest possible rectangle, for example the bottom, and make sure that the line is entirely contained in their borders. This will also fail if an empty space in the middle of the rectangle is a problem, however. I think the complexity will be O(n2).

I think you are on the right direction. After you get the coordinates of largest possible rectangle,
If the largest possible rectangle is a valid rectangle, then each side of it must be union of sides of original rectangles. You can scan the original rectangle set, find those rectangles that is a part of the largest side we are looking for (this can be done in O(n) by checking if X==largestRectangle.Top.X if you are looking at top side, etc.), lets call them S.
For each side s in S we can create an interval [from,to]. All we need to check is whether the union of all intervals matches the side of the largest Rectangle. This can be done in O(nlog(n)) by standard algorithms, or on average O(n) by some hash trick (see http://www.careercup.com/question?id=12523672 , see my last comment (of the last comment) there for the O(n) algorithm ).
For example, say we got two 1*1 rectangles in the first quadrant, there left bottom coordinates are (0,0) and (1,0). Largest rectangle is 2*1 with left bottom coordinate (0,0). Since [0,1] Union [1,2] is [0,2], top side and bottom side match the largest rectangle, similar for left and right side.
Now suppose we got an U shape. 3*1 at (0,0), 1*1 at (0,1), 1*1 at (2,1), we got largest rectangle 3*2 at (0,0). Since for the top side we got [0,1] Union [1,3] does not match [0,3], the algorithm will output the union of above rectangles is not a rectangle.
So you can do this in O(n) on average, or O(nlog(n)) at least if you don't want to mess with some complex hash bucket algorithm. Much better than O(n^4)!
Edit: We have a small problem if there exists empty space somewhere in the middle of all rectangles. Let me think about it....
Edit2: An easy way to detect empty space is for each corner of a rectangle which is not a point on the largest rectangle, we go outward a little bit for all four directions (diagonal) and check if we are still in any rectangle. This is O(n^2). (Which ruins my beautiful O(nlog(n))! Can anyone can come up a better idea?

I haven't looked at a similar problem in the past, so there maybe far more efficient ways of doing it. The key problem is that you cannot look at containment of one rectangle in another in isolation since they could be adjacent but still form a rectangle, or one rectangle could be contained within multiple.
You can't just look at the projection of each rectangle on to the edges of the bounding rectangle unless the problem allows you to leave holes in the middle of the rectangle, although that is probably a fast initial check that could be performed before the following exhaustive approach:
Running through the list once, calculating the minimum and maximum x and y coordinates and the area of each rectangle
Create an input list containing your input rectangles ordered by descending size.
Create a work list containing the bounding rectangle initially
While there are rectangles in the work list
Take the largest rectangle in the input list R
Create an empty list for fragments
for each rectangle r in the work list, intersect r with R, splitting r into a rectangular portion contained within R (if any) and zero or more rectangles not within R. If r was split, discard the portion contained within R and add the remaining rectangles to the fragment list.
add the contents of the fragment list to the work list

Assuming your rectangles are aligned to the coordinate axis:
Given two rectangles A, B, you can make a function that subtracts B from A returning a set of sub-rectangles of A (that may be the empty set): Set = subtract_rectangle(A, B)
Then, given a set of rectangles R for which you want to know if their union is a rectangle:
Calculate a maximum rectangle Big that covers all the rectangles as ((min_x,min_y)-(max_x,max_y))
make the set S contain the rectangle Big: S = (Big)
for every rectangle B in R:
S1 = ()
for evey rectangle A in S:
S1 = S1 + subtract_rectangle(A, B)
S = S1
if S is empty then the union of the rectangles is a rectangle.
End, S contains the parts of Big not covered by any rectangle from R
If the rectangles are not aligned to the coordinate axis you can use a similar algorithm but that employs triangles instead of rectangles. The only issues are that subtracting triangles is not so simple to implement and that handling numerical errors can be difficult.

A simple approach just came to mind: If two rectangles share an edge[1], then together they form a rectangle which contains both - either the rectangles are adjacent [][ ] or one contains the other [[] ].
So if the list of rectangles forms a larger rectangle, then all you need it to repeatedly iterate over the rectangles, and "unify" pairs of them into a single larger one. If in one iteration you can unify none, then it is not possible to create any larger rectangle than you already have, with those pieces; otherwise, you will keep "unifying" rectangles until a single is left.
[1] Share, as in they have the same edge; it is not enough for one of them to have an edge included in one of the other's edges.
efficiency
Since efficiency seems to be a problem, you could probably speed it up by creating two indexes of rectangles, one with the larger edge size and another with the smaller edge size.
Then compare the edges with the same size, and if they are the same unify the two rectangles, remove them from the indexes and add the new rectangle to the indexes.
You can probably speed it up by not moving to the next iteration when you unify something, but to proceed to the end of the indexes before reiterating. (Stopping when one iteration does no unifications, or there is only one rectangle left.)
Additionally, the edges of a rectangle resulting from unification are by analysis always equal or larger than the edges of the original rectangles.
So if the indexes are ordered by ascending edge size, the new rectangle will be inserted in either the same position as you are checking or in positions yet to be checked, so each unification will not require an extra iteration cycle. (As the new rectangle will assuredly not unify with any rectangle previously checked in this iteration, since its edges are larger than all edges checked.)
For this to hold, in each step of a particular iteration you need to attempt unification on the next smaller edge from either of the indexes:
If you're in index1=3 and index2=6, you check index1 and advance that index;
If next edge on that index is 5, next iteration step will be in index1=5 and index2=6, so it will check index1 and advance that index;
If next edge on that index is 7, next iteration step will be in index1=7 and index2=6, so it will check index2 and advance that index;
If next edge on that index is 10, next iteration step will be in index1=7 and index2=10, so it will check index1 and advance that index;
etc.
examples
[A ][B ]
[C ][D ]
A can be unified with B, C with D, and then AB with CD. One left, ABCD, thus possible.
[A ][B ]
[C ][D ]
A can be unified with B, C with D, but AB cannot be unified with CD. 2 left, AB and CD, thus not possible.
[A ][B ]
[C ][D [E]]
A can be unified with B, C with D, CD with E, CDE with AB. 1 left, ABCDE, thus possible.
[A ][B ]
[C ][D ][E]
A can be unified with B, C with D, CD with AB, but not E. 2 left, ABCD and E, thus not possible.
pitfall
If a rectangle is contained in another but does not share a border, this approach will not unify them.
A way to address this is, when one hits an iteration that does not unify anything and before concluding that it is not possible to unify the set of rectangles, to get the rectangle with the widest edge and discard from the indexes all others that are contained within this largest rectangle.
This still does not address two situations.
First, consider the situation where with this map:
A B C D
E F G H
we have rectangles ACGE and BDFH. These rectangles share no edge and are not contained, but form a larger rectangle.
Second, consider the situation where with this map:
A B C D
E F G H
I J K L
we have rectangles ABIJ, CDHG and EHLI. They do not share edges, are not contained within each-other, and no two of them can be unified into a single rectangle; but form a rectangle, in total.
With these pitfalls this method is not complete. But it can be used to greatly reduce the complexity of the problem and reduce the number of rectangles to analyse.

Maybe...
Gather up all the x-coordinates in a list, and sort them. From this list, create a sequence of adjacent intervals. Do the same thing for the y-coordinates. Now you've got two lists of intervals. For each pair of intervals (A=[x1,x2] from the x-list, B=[y1,y2] from the y-list), make their product rectangle A x B = (x1,y1)-(x2,y2)
If every single product rectangle is contained in at least one of your initial rectangles, then the union must be a rectangle.
Making this efficient (I think I've offered about an O(n4) algorithm) is a different question entirely.

As jva stated, "Your own version does not take into account that the edges of the rectangles can be non-parallel to each other." This answer also assumes "parallel" rectangles.
If you have a grid as opposed to needing infinite precision, depending on the number and sizes of the rectangles and the granularity of the grid, it might be feasible to brute-force it.
Just take your "largest rectangle possible" and test all its points to see whether each point is in at least one of the smaller rectangles.

I finally was able to find the impressive javascript project (thanks to github search :) !)
https://github.com/evanw/csg.js
Also have a look into my answer here with other interesting projects

General case, thinking in images:
| outer_rect - union(inner rectangles) |
Check that result is zero

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio