Form a rectangle from plane points - algorithm

I have a set of red lines from which I get a set of green intersection points (visible on the screen):
Then I want to find the four points that most likely describe the rectangle (if there are several options, then choose the largest area). I read similar questions about how to find points that EXACTLY form a rectangle:
find if 4 points on a plane form a rectangle?
https://softwareengineering.stackexchange.com/questions/176938/how-to-check-if-4-points-form-a-square
There is an option to iterate over all four points and calculate the probability that they form a rectangle (or some coefficient of similarity to a rectangle). Suppose at the moment we are considering four points A, B, C, D. I tried 2 similarity functions:
,
where <> denotes dot product, and || - vector norm.
,
where std is the standard deviation of the distances from the vertices to the center of mass of the assumed rectangle, and mean is the average distance.
Both functions did not perform well.
Is there a way to introduce a function that is close to 1 when the four points of the plane are close to the vertices of the rectangle and equal to 0 when they are at the position farthest from the rectangle (assuming they are on 1 line)?

I can't really speak to finding an appropriate cost function for scoring what a "good" rectangle is. From the comments it looks like there's a lot of discussion, but no consensus. So for now I'm going to just use a scoring function that penalizes four-point shapes for having angles that are further away from 90 degrees. Specifically, I'm summing the squared distance. If you want to have a different scoring metric you can replace the calculation in the scoreFunc function.
I set up an interactive window where you can click to add points. When you press 'q' it'll take those points, find all possible combinations (not permutations) of 4 points, and then run the scoring function on each and draws the best.
I'm using a recursive, brute-force search. To avoid having a ton of duplicates I came up with a hashing function that works regardless of order. I used prime numbers to ID each point and the hashing function just takes the product of the ID's of the points. This ensures that (1,3,5,7) is the same as (3,1,7,5). I used primes because the product of primes is unique in this situation (they can't be factorized and clumped because they're primes).
After the search I have to make sure that the points are ordered in such a way that the lines aren't intersecting. I'm taking advantage of OpenCV's contourArea to do that calculation for me. I can swap the first point with it's horizontal and vertical neighbor and compare the areas to the original. "Bowtie" shapes from intersecting lines will have less area (I'm pretty sure they actually get zero area because they don't count as closed shapes) than a non-intersection shape.
import cv2
import numpy as np
import math
# get mouse click
click_pos = None;
click = False;
def mouseClick(event, x, y, flags, param):
# hook to globals
global click_pos;
global click;
# check for left mouseclick
if event == cv2.EVENT_LBUTTONDOWN:
click = True;
click_pos = (x,y);
# prime hash function
def phash(points):
total = 1;
for point in points:
total *= point[0];
return total;
# checks if an id is already present in list
def isInList(point, curr_list):
pid = point[0];
for item in curr_list:
if item[0] == pid:
return True;
return False;
# look for rectangles
def getAllRects(points, curr_list, rects, curr_point):
# check if already in curr_list
if isInList(curr_point, curr_list):
return curr_list;
# add self to list
curr_list.append(curr_point);
# check end condition
if len(curr_list) == 4:
# add to dictionary (no worry for duplicates)
rects[phash(curr_list)] = curr_list[:];
curr_list = curr_list[:-1];
return curr_list;
# continue search
for point in points:
curr_list = getAllRects(points, curr_list, rects, point);
curr_list = curr_list[:-1];
return curr_list;
# checks if a number is prime
def isPrime(num):
bound = int(math.sqrt(num));
curr = 3;
while curr <= bound:
if num % curr == 0:
return False;
# skip evens
curr += 2;
return True;
# generate prime number id's for each point
def genPrimes(num):
primes = [];
curr = 1;
while len(primes) < num:
if isPrime(curr):
primes.append(curr);
# +2 to skip evens
curr += 2;
return primes;
# swap sides (fix intersecting lines issue)
def swapH(box):
new_box = np.copy(box);
new_box[0] = box[1];
new_box[1] = box[0];
return new_box;
def swapV(box):
new_box = np.copy(box);
new_box[0] = box[3];
new_box[3] = box[0];
return new_box;
# removes intersections
def noNoodles(box):
# get three variants
hbox = swapH(box);
vbox = swapV(box);
# get areas and choose max
sortable = [];
sortable.append([cv2.contourArea(box), box]);
sortable.append([cv2.contourArea(hbox), hbox]);
sortable.append([cv2.contourArea(vbox), vbox]);
sortable.sort(key = lambda a : a[0]);
return sortable[-1][1];
# 2d distance
def dist2D(one, two):
dx = one[0] - two[0];
dy = one[1] - two[1];
return math.sqrt(dx*dx + dy*dy);
# angle between three points (the last point is the middle)
# law of cosines
def angle3P(p1, p2, p3):
# get distances
a = dist2D(p3, p1);
b = dist2D(p3, p2);
c = dist2D(p1, p2);
# calculate angle // assume a and b are nonzero
numer = c**2 - a**2 - b**2;
denom = -2 * a * b;
if denom == 0:
denom = 0.000001;
rads = math.acos(numer / denom);
degs = math.degrees(rads);
return degs;
# calculates a score
def scoreFunc(box):
# for each point, calculate angle
angles = [];
for a in range(len(box)):
prev = box[a-2][0];
curr = box[a-1][0];
next = box[a][0];
angles.append(angle3P(prev, next, curr));
# for each angle, score on squared distance from 90
score = 0;
for angle in angles:
score += (angle - 90)**2;
return score;
# evaluates each box (assigns a score)
def evaluate(boxes):
sortable = [];
for box in boxes:
# INSERT YOUR OWN SCORING FUNC HERE
sortable.append([scoreFunc(box), box]);
sortable.sort(key = lambda a : a[0]);
return sortable;
# set up callback
cv2.namedWindow("Display");
cv2.setMouseCallback("Display", mouseClick);
# set up screen
res = (600,600,3);
bg = np.zeros(res, np.uint8);
# loop
done = False;
points = [];
while not done:
# reset display
display = np.copy(bg);
# check for new click
if click:
click = False;
points.append(click_pos);
# draw points
for point in points:
cv2.circle(display, point, 4, (0,200,0), -1);
# show
cv2.imshow("Display", display);
key = cv2.waitKey(1);
# check keypresses
done = key == ord('q');
# generate prime number id's for each point
# if you have a lot of points, it would be worth it
# to just have a .txt file with a bunch of pre-gen primes in it
primes = genPrimes(len(points));
print(primes);
withPrimes = [];
for a in range(len(points)):
withPrimes.append([primes[a], points[a]]);
# run brute-force search over all points
rects = {};
for a in range(len(withPrimes)):
getAllRects(withPrimes, [], rects, withPrimes[a]);
print(len(rects));
# extract just the points (don't need the prime id's anymore)
boxes = [];
for key in rects:
box = [];
for item in rects[key]:
box.append([item[1]]);
boxes.append(np.array(box));
# go through all of the boxes and un-intersect their sides
for a in range(len(boxes)):
boxes[a] = noNoodles(boxes[a]);
# draw each one to check for noodles
# for box in boxes:
# blank = np.zeros_like(bg, np.uint8);
# cv2.drawContours(blank, [box], -1, (255,255,255), -1);
# cv2.imshow("Box", blank);
# cv2.waitKey(0);
# noodles have been squared get best box
sortedBoxes = evaluate(boxes);
bestBox = sortedBoxes[0][1];
# draw
blank = np.zeros_like(bg, np.uint8);
cv2.drawContours(blank, [bestBox], -1, (255,255,255), -1);
for point in points:
cv2.circle(blank, point, 4, (0,200,0), -1);
cv2.imshow("Best", blank);
cv2.waitKey(0);

Related

Function using multiprocessing returns None values

I have been stuck on this for quite a while now. I am using the multiprocessing function to speed up a function that previously looped over data points. Before using multiprocessing the function worked fine, but now it returns some none values (the first few, <10) before returning values. I have tried many things and different ways to use the multiprocessing pool.
The multiprocessing is used inside a function, which I am not sure if that might be the problem?
def SkyViewFactor(point, coords, max_radius):
betas_lin = np.linspace(0,2*np.pi,steps_beta)
"""this is the analytical dome area but should make same assumption as for d_area"""
dome_area = max_radius**2*2*np.pi
""" we throw away all point outside the dome
# dome is now a 5 column array of points:
# the 5 columns: x,y,z,radius,angle theta"""
dome_p = dome(point, coords, max_radius)
betas = np.zeros(steps_beta)
"""we loop over all points in the dome"""
d = 0
while (d < dome_p.shape[0]):
psi = np.arctan((dome_p[d,2]-point[2])/dome_p[d,3])
"""The angles of the min and max angle of the building"""
beta_min = - np.arcsin(np.sqrt(2*gridboxsize**2)/2/dome_p[d,3]) + dome_p[d,4]
beta_max = np.arcsin(np.sqrt(2*gridboxsize**2)/2/dome_p[d,3]) + dome_p[d,4]
"""Where the index of betas fall within the min and max beta, and there is not already a larger psi blocking"""
betas[np.nonzero(np.logical_and((betas < psi), np.logical_and((beta_min <= betas_lin), (betas_lin < beta_max))))] = psi
d +=1
areas = d_area(betas, steps_beta, max_radius)
"""The SVF is the fraction of area of the dome that is not blocked"""
SVF = np.around((dome_area - np.sum(areas))/dome_area, 3)
#print(SVF)
return SVF
def calc_SVF(coords, max_radius, blocklength):
"""
Function to calculate the sky view factor.
We create a dome around a point with a certain radius,
and for each point in the dome we evaluate of the height of this point blocks the view
:param coords: all coordinates of our dataset
:param max_radius: maximum radius we think influences the svf
:param blocklength: the first amount of points in our data set we want to evaluate
:return: SVF for all points
"""
def parallel_runs_SVF():
points = [coords[i,:] for i in range(blocklength)]
pool = Pool()
SVF_list = []
SVF_par = partial(SkyViewFactor, coords=coords,max_radius=max_radius) # prod_x has only one argument x (y is fixed to 10)
SVF = pool.map(SVF_par, points)
pool.close()
pool.join()
# if SVF != None:
# SVF_list.append(SVF)
print(SVF)
return SVF
if __name__ == '__SVF__':
return parallel_runs_SVF()
This function is later called in:
def reshape_SVF(data,coords,julianday,lat,long,LMT,reshape,save_CSV,save_Im):
[x_len, y_len] = [int(data.shape[0]/2),int(data.shape[1]/2)]
blocklength = int(x_len*y_len)
"Compute SVF and SF and Reshape the shadow factors and SVF back to nd array"
SVFs = calc_SVF(coords,max_radius,blocklength)
SFs = calc_SF(coords,julianday,lat,long,LMT,blocklength)
#SVFs = filter(None, SVFs)
"If reshape is true we reshape the arrays to the original data matrix"
if reshape == True:
SVF_matrix = np.ndarray([x_len,y_len])
SF_matrix = np.ndarray([x_len,y_len])
for i in range(blocklength):
SVF_matrix[coords[int(i-x_len/2),0],coords[int(i-y_len/2),1]] = SVFs[i]
SF_matrix[coords[int(i-x_len/2),0],coords[int(i-y_len/2),1]] = SFs[i]
if save_CSV == True:
np.savetxt("SVFmatrix.csv", SVF_matrix, delimiter=",")
np.savetxt("SFmatrix.csv", SF_matrix, delimiter=",")
if save_Im == True:
tf.imwrite('SVF_matrix.tif', SVF_matrix, photometric='minisblack')
tf.imwrite('SF_matrix.tif', SF_matrix, photometric='minisblack')
return SF_matrix,SF_matrix
elif reshape == False:
np.savetxt("SVFs.csv", SVFs, delimiter=",")
np.savetxt("SFs.csv", SFs, delimiter=",")
return SVFs, SFs
SFs is a similar function with the same structure (it also uses multiprocessing). The goal is to return a list with all Sky View factors, or if reshape is true a matrix with the same shape as the original input data (DSM data) with the sky view factor for each location.
I get the error:
SVF_matrix[coords[int(i-x_len/2),0],coords[int(i-y_len/2),1]] = SVFs[i]
TypeError: 'NoneType' object is not subscriptable
I tried to filter out nones with the filter() function using
SVFs = filter(None, SVFs)
This returns the error
TypeError: 'NoneType' object is not iterable. Also I do not know if the None values returned are instead of the actual values or extra (i.e. if I have 1000 datapoints I should have 1000 sky view factors, but do I get an array with 3 Nones and then numbers, do I get 3 Nones ánd 1000 sky view factors or do I get a list of 1000 values of which the first 3 are Nones?)
I also tried to Make an empty list and append the SVF only if this is not None, this however also does not work
SVF_list = []
if SVF != None:
SVF_list.append(SVF)
return SVF_list
To include all used functions: These are the functions used in SkyViewFactor to calculate distances and the area elements
def dist(point, coord):
"""
:param point: evaluation point (x,y,z)
:param coord: array of coordinates with heights
:return: the distance from each coordinate to the point and the angle
"""
# Columns is dx
dx = (coord[:,1]-point[1])*gridboxsize
# Rows is dy
dy = (coord[:,0]-point[0])*gridboxsize
dist = np.sqrt(abs(dx)**2 + abs(dy)**2)
"""angle is 0 north direction"""
angle = np.arctan2(dy,dx)+np.pi/2
return dist,angle
def dome(point, coords, maxR):
"""
:param point: point we are evaluating
:param coords: array of coordinates with heights
:param maxR: maximum radius in which we think the coordinates can influence the SVF
:return: a dome of points that we take into account to evaluate the SVF
"""
radii, angles = dist(point,coords)
coords = np.column_stack([coords, radii])
coords = np.column_stack([coords, angles])
"""the dome consist of points higher than the view height and within the radius we want"""
dome = coords[(np.logical_and(coords[:,3]<maxR,coords[:,3]>0.1)),:]
dome = dome[(dome[:,2]>point[2]),:]
return dome
def d_area(psi,steps_beta,maxR):
"""Radius at ground surface and at the height of the projection of the building"""
d_area = 2*np.pi/steps_beta*maxR**2*np.sin(psi)
return d_area
Help with this or some other suggestion to speed up my code / use multiprocessing the right way is very appreciated!
I am trying to speed up a for loop using multiprocessing, this works without multiprocessing but since I have to iterate over 1.250.000 datapoints it is way to slow. With multiprocessing it returns None for the first few values.

Fast algorithm to generate rectangles that contains a number of 2D points

I have one problem which I'm struggling with.
Given the following:
an array all_points containing 2D points, each points is represented as a tuple (x, y).
an array musthave_points containing the indices of points that are in all_points.
an integer m, with m < len(all_points).
Return a list of rectangles, in which a rectangle is represented by a tuple containing its 4 vertices ((x0, y0), (x1, y1), (x2, y2), (x3, y3)), each rectangle must satisfy the conditions below:
Contains m points from all_points, these m points must lay completely inside the rectangle, i.e not on either 4 of the rectangle's edges.
Contains all points from musthave_points. If musthave_points is an empty list, the rectangles only need to satisfy the first condition.
If there's no such rectangle, return an empty list. Two rectangles are considered "identical" if they contain the same subset of points and there should not be "identical" rectangles in the output.
Note: One simple brute-force solution is to first generate all combinations of m points, each of them contains all points from musthave_points. For each combination, create one rectangle that covers all points in the combination. Then count the number of points that lays inside the rectangle, if the number of points is m, it's a valid rectangle.
But that solution runs in factorial time complexity. Can you come up with something faster than that?
I already implemented the brute-force as shown below, but it's terribly slow.
import itertools
import numpy as np
import cv2
import copy
import sys
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
# Credit: https://github.com/dbworth/minimum-area-bounding-rectangle/blob/master/python/min_bounding_rect.py
def minBoundingRect(hull_points_2d):
#print "Input convex hull points: "
#print hull_points_2d
# Compute edges (x2-x1,y2-y1)
edges = np.zeros((len(hull_points_2d) - 1, 2)) # empty 2 column array
for i in range(len(edges)):
edge_x = hull_points_2d[i+1, 0] - hull_points_2d[i, 0]
edge_y = hull_points_2d[i+1, 1] - hull_points_2d[i, 1]
edges[i] = [edge_x,edge_y]
# Calculate edge angles atan2(y/x)
edge_angles = np.zeros((len(edges))) # empty 1 column array
for i in range(len(edge_angles)):
edge_angles[i] = np.math.atan2(edges[i,1], edges[i,0])
# Check for angles in 1st quadrant
for i in range(len(edge_angles)):
edge_angles[i] = np.abs(edge_angles[i] % (np.math.pi/2)) # want strictly positive answers
# Remove duplicate angles
edge_angles = np.unique(edge_angles)
# Test each angle to find bounding box with smallest area
min_bbox = (0, sys.maxsize, 0, 0, 0, 0, 0, 0) # rot_angle, area, width, height, min_x, max_x, min_y, max_y
for i in range(len(edge_angles) ):
R = np.array([[np.math.cos(edge_angles[i]), np.math.cos(edge_angles[i]-(np.math.pi/2))], [np.math.cos(edge_angles[i]+(np.math.pi/2)), np.math.cos(edge_angles[i])]])
# Apply this rotation to convex hull points
rot_points = np.dot(R, np.transpose(hull_points_2d)) # 2x2 * 2xn
# Find min/max x,y points
min_x = np.nanmin(rot_points[0], axis=0)
max_x = np.nanmax(rot_points[0], axis=0)
min_y = np.nanmin(rot_points[1], axis=0)
max_y = np.nanmax(rot_points[1], axis=0)
# Calculate height/width/area of this bounding rectangle
width = max_x - min_x
height = max_y - min_y
area = width*height
# Store the smallest rect found first (a simple convex hull might have 2 answers with same area)
if (area < min_bbox[1]):
min_bbox = (edge_angles[i], area, width, height, min_x, max_x, min_y, max_y)
# Re-create rotation matrix for smallest rect
angle = min_bbox[0]
R = np.array([[np.math.cos(angle), np.math.cos(angle-(np.math.pi/2))], [np.math.cos(angle+(np.math.pi/2)), np.math.cos(angle)]])
# Project convex hull points onto rotated frame
proj_points = np.dot(R, np.transpose(hull_points_2d)) # 2x2 * 2xn
#print "Project hull points are \n", proj_points
# min/max x,y points are against baseline
min_x = min_bbox[4]
max_x = min_bbox[5]
min_y = min_bbox[6]
max_y = min_bbox[7]
#print "Min x:", min_x, " Max x: ", max_x, " Min y:", min_y, " Max y: ", max_y
# Calculate center point and project onto rotated frame
center_x = (min_x + max_x)/2
center_y = (min_y + max_y)/2
center_point = np.dot([center_x, center_y], R)
#print "Bounding box center point: \n", center_point
# Calculate corner points and project onto rotated frame
corner_points = np.zeros((4,2)) # empty 2 column array
corner_points[0] = np.dot([max_x, min_y], R)
corner_points[1] = np.dot([min_x, min_y], R)
corner_points[2] = np.dot([min_x, max_y], R)
corner_points[3] = np.dot([max_x, max_y], R)
return (angle, min_bbox[1], min_bbox[2], min_bbox[3], center_point, corner_points) # rot_angle, area, width, height, center_point, corner_points
class PatchGenerator:
def __init__(self, all_points, musthave_points, m):
self.all_points = copy.deepcopy(all_points)
self.n = len(all_points)
self.musthave_points = copy.deepcopy(musthave_points)
self.m = m
#staticmethod
def create_rectangle(points):
rot_angle, area, width, height, center_point, corner_points = minBoundingRect(points)
return corner_points
#staticmethod
def is_point_inside_rectangle(rect, point):
pts = Point(*point)
polygon = Polygon(rect)
return polygon.contains(pts)
def check_valid_rectangle(self, rect, the_complement):
# checking if the rectangle contains any other point from `the_complement`
for point in the_complement:
if self.is_point_inside_rectangle(rect, point):
return False
return True
def generate(self):
rects = []
# generate all combinations of m points, including points from musthave_points
the_rest_indices = list(set(range(self.n)).difference(self.musthave_points))
comb_indices = itertools.combinations(the_rest_indices, self.m - len(self.musthave_points))
comb_indices = [self.musthave_points + list(inds) for inds in comb_indices]
# for each combination
for comb in comb_indices:
comb_points = np.array(self.all_points)[comb]
## create the rectangle that covers all m points
rect = self.create_rectangle(comb_points)
## check if the rectangle is valid
the_complement_indices = list(set(range(self.n)).difference(comb))
the_complement_points = list(np.array(self.all_points)[the_complement_indices])
if self.check_valid_rectangle(rect, the_complement_points):
rects.append([comb, rect]) # indices of m points and 4 vertices of the valid rectangle
return rects
if __name__ == '__main__':
all_points = [[47.43, 20.5 ], [47.76, 43.8 ], [47.56, 23.74], [46.61, 23.73], [47.49, 18.94], [46.95, 25.29], [54.31, 23.5], [48.07, 17.77],
[48.2 , 34.87], [47.24, 22.07], [47.32, 27.05], [45.56, 17.95], [41.29, 19.33], [45.48, 28.49], [42.94, 15.24], [42.05, 34.3 ],
[41.04, 26.3 ], [45.37, 21.17], [45.44, 24.78], [44.54, 43.89], [30.49, 26.79], [40.55, 22.81]]
musthave_points = [3, 5, 9]
m = 17
patch_generator = PatchGenerator(all_points, musthave_points, 17)
patches = patch_generator.generate()
Every such rectangle can be shrunk to the minimum size such that it still contains the same points. Thus you only need to check such minimal rectangles. Let n be the total number of points. Then there are at most n possible coordinates for the left side, and likewise for the other sides. For each possible pair of left and right side coordinates, you can do a linear sweep for the top and bottom coordinates. Final time complexity would be O(n^3).

Ordering coordinates from top left to bottom right

How can I go about trying to order the points of an irregular array from top left to bottom right, such as in the image below?
Methods I've considered are:
calculate the distance of each point from the top left of the image (Pythagoras's theorem) but apply some kind of weighting to the Y coordinate in an attempt to prioritise points on the same 'row' e.g. distance = SQRT((x * x) + (weighting * (y * y)))
sort the points into logical rows, then sort each row.
Part of the difficulty is that I do not know how many rows and columns will be present in the image coupled with the irregularity of the array of points. Any advice would be greatly appreciated.
Even though the question is a bit older, I recently had a similar problem when calibrating a camera.
The algorithm is quite simple and based on this paper:
Find the top left point: min(x+y)
Find the top right point: max(x-y)
Create a straight line from the points.
Calculate the distance of all points to the line
If it is smaller than the radius of the circle (or a threshold): point is in the top line.
Otherwise: point is in the rest of the block.
Sort points of the top line by x value and save.
Repeat until there are no points left.
My python implementation looks like this:
#detect the keypoints
detector = cv2.SimpleBlobDetector_create(params)
keypoints = detector.detect(img)
img_with_keypoints = cv2.drawKeypoints(img, keypoints, np.array([]), (0, 0, 255),
cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
points = []
keypoints_to_search = keypoints[:]
while len(keypoints_to_search) > 0:
a = sorted(keypoints_to_search, key=lambda p: (p.pt[0]) + (p.pt[1]))[0] # find upper left point
b = sorted(keypoints_to_search, key=lambda p: (p.pt[0]) - (p.pt[1]))[-1] # find upper right point
cv2.line(img_with_keypoints, (int(a.pt[0]), int(a.pt[1])), (int(b.pt[0]), int(b.pt[1])), (255, 0, 0), 1)
# convert opencv keypoint to numpy 3d point
a = np.array([a.pt[0], a.pt[1], 0])
b = np.array([b.pt[0], b.pt[1], 0])
row_points = []
remaining_points = []
for k in keypoints_to_search:
p = np.array([k.pt[0], k.pt[1], 0])
d = k.size # diameter of the keypoint (might be a theshold)
dist = np.linalg.norm(np.cross(np.subtract(p, a), np.subtract(b, a))) / np.linalg.norm(b) # distance between keypoint and line a->b
if d/2 > dist:
row_points.append(k)
else:
remaining_points.append(k)
points.extend(sorted(row_points, key=lambda h: h.pt[0]))
keypoints_to_search = remaining_points
Jumping on this old thread because I just dealt with the same thing: sorting a sloppily aligned grid of placed objects by left-to-right, top to bottom location. The drawing at the top in the original post sums it up perfectly, except that this solution supports rows with varying numbers of nodes.
S. Vogt's script above was super helpful (and the script below is entirely based on his/hers), but my conditions are narrower. Vogt's solution accommodates a grid that may be tilted from the horizontal axis. I assume no tilting, so I don't need to compare distances from a potentially tilted top line, but rather from a single point's y value.
Javascript below:
interface Node {x: number; y: number; width:number; height:number;}
const sortedNodes = (nodeArray:Node[]) => {
let sortedNodes:Node[] = []; // this is the return value
let availableNodes = [...nodeArray]; // make copy of input array
while(availableNodes.length > 0){
// find y value of topmost node in availableNodes. (Change this to a reduce if you want.)
let minY = Number.MAX_SAFE_INTEGER;
for (const node of availableNodes){
minY = Math.min(minY, node.y)
}
// find nodes in top row: assume a node is in the top row when its distance from minY
// is less than its height
const topRow:Node[] = [];
const otherRows:Node[] = [];
for (const node of availableNodes){
if (Math.abs(minY - node.y) <= node.height){
topRow.push(node);
} else {
otherRows.push(node);
}
}
topRow.sort((a,b) => a.x - b.x); // we have the top row: sort it by x
sortedNodes = [...sortedNodes,...topRow] // append nodes in row to sorted nodes
availableNodes = [...otherRows] // update available nodes to exclude handled rows
}
return sortedNodes;
};
The above assumes that all node heights are the same. If you have some nodes that are much taller than others, get the value of the minimum node height of all nodes and use it instead of the iterated "node.height" value. I.e., you would change this line of the script above to use the minimum height of all nodes rather that the iterated one.
if (Math.abs(minY - node.y) <= node.height)
I propose the following idea:
1. count the points (p)
2. for each point, round it's x and y coordinates down to some number, like
x = int(x/n)*n, y = int(y/m)*m for some n,m
3. If m,n are too big, the number of counts will drop. Determine m, n iteratively so that the number of points p will just be preserved.
Starting values could be in alignment with max(x) - min(x). For searching employ a binary search. X and Y scaling would be independent of each other.
In natural words this would pin the individual points to grid points by stretching or shrinking the grid distances, until all points have at most one common coordinate (X or Y) but no 2 points overlap. You could call that classifying as well.

Calculating a probability value in range with min / max bounds

Think of a 2D grid, e.g. in the size of 1000x1000 cells, which is used as the map of a level in a game. This map is dynamically filled with game objects during runtime. Now we need to calculate the probability of placing a new object into the a given x/y position in this grid.
What I have already is an int array the holds the number of game objects in close distance to the cell at x/y. The index of this array represents the cell distance to the given cell, and each value in the array tells the number of game objects in the grid at that distance. So for example the array could look like this:
0, 0, 1, 2, 0, 3, 1, 0, 4, 0, 1
This would mean that 0 objects are in the grid cell at x/y itself, 0 objects are in the direct neighbour cells, 1 object is in a cell with a distance of two cells, 2 objects are in the cells of a distance of three cells, and so on. The following figure illustrates this example:
The task now is to calculate how likely it is to place a new object at x/y, based on the values in this array. The algorithm should be something like this:
if at least one object is already closer than min, then the probability must be 0.0
else if no object is within a distance of max, then the probability must be 1.0
else the probability depends on how many objects are close to x/y, and how many.
So in other words: if there is at least one game object already very close, we don't want a new one. On the other hand if there is no object within a max radius, we want a new object in any case. Or else we want to place a new object with a probability depending on how many other objects there are close to x/y -- the more objects are close, and the closer they are, the less likely we want to place a new object.
I hope my description was understandable.
Can you think of an elegent algorithm or formula to calculate this probability?
PS: Sorry for the title of this question, I don't know how to summarize my question better.
One approach I'd consider is to compute a "population density" for that square. The lower the population density, the higher the probability that you would place an item there.
As you say, if there is an item at (x,y), then you can't place an item there. So consider that a population density of 1.0.
At the next level out there are 8 possible neighbors. The population density for that level would be n/8, where n is the number of items at that level. So if there are 3 objects that are adjacent to (x,y), then the density of that level is 3/8. Divide that by (distance+1).
Do the same for all levels. That is, compute the density of each level, divide by (distance+1), and sum the results. The divisor at each level is (distance*8). So your divisors are 8, 16, 24, etc.
Once you compute the results, you'll probably want to play with the numbers a bit to adjust the probabilities. That is, if you come up with a sum of 0.5, that space is likely pretty crowded. You wouldn't want to use (1-density) as your probability for generating an item. But the method I outline above should give you a single number to play with, which should simplify the problem.
So the algorithm looks something like:
total_density = 0;
for i = 0; i < max; ++i
if (i == 0)
local_density = counts[i]
else
local_density = counts[i]/(i*8); // density at that level
total_density = total_density + (local_density/(i+1))
If dividing the local density by (i+1) over-exaggerates the effect of distance, consider using something like log(i+1) or sqrt(i+1). I've found that to be useful in other situations where the distance is a factor, but not linearly.
lets assume your array's name is distances.
double getProbability()
{
for(int i=0 ; i<min ; i++)
{
if(distances[i]!=0) return 0;
}
int s = 0;
bool b = true;
for(int i=min ; i<max ; i++)
{
b = b && (distances[i]==0)
s+= distances[i]/(i+1);
}
if(b) return 1;
for(int i=0 ; i<distances.Count() ; i++)
{
s+= distances[i]/(i+1);
}
else return (float)s/totalObjectNum;
}
This approach calculates a weighted sum of those objects in a distance > min and <= max.
Parallel an upper limit is calculated (called normWeight) which depends only on max.
If at least one object is in a distance > min and <= max then
the probability closest to 1 would be 1-(1/normWeight) for 1 object on the outer ring.
The minimal probability would be 1-((normWeight-1)/normWeight). E.g. for
max-1 objects on the outer ring.
The calculation of the weighted sum can be modified by calculating different values for the variable delta.
float calculateProbabilty()
{
vector<int> numObjects; // [i] := number of objects in a distance i
fill numObjects ....
// given:
int min = ...;
int max = ...; // must be >= min
bool anyObjectCloserThanMin = false;
bool anyObjectCloserThanMax = false;
// calculate a weighted sum
float sumOfWeights = 0.0;
float normWeight = 0.0;
for (int distance=0; distance <= max; distance++)
{
// calculate a delta-value for increasing sumOfWeights depending on distance
// the closer the object the higher the delta
// e.g.:
float delta = (float)(max + 1 - distance);
normWeight += delta;
if (numObjects[distance] > 0 && distance < min)
{
anyObjectCloserThanMin = true;
break;
}
if (numObjects[distance] > 0)
{
anyObjectCloserThanMax = true;
sumOfWeights += (float)numObjects[distance] * delta;
}
}
float probability = 0.0;
if (anyObjectCloserThanMin)
{
// if at least one object is already closer than min, then the probability must be 0.0
probability = 0.0;
}
else if (!anyObjectCloserThanMax)
{
// if no object is within a distance of max, then the probability must be 1.0
probability = 1.0;
}
else
{
// else the probability depends on how many objects are close to x/y
// in this scenario normWeight defines an upper limited beyond that
// the probability becomes 0
if (sumOfWeights >= normWeight)
{
probability = 0.0;
}
else
{
probability = 1. - (sumOfWeights / normWeight);
// The probability closest to 1 would be 1-(1/normWeight) for 1 object on the outer ring.
// The minimal probability would be 1-((normWeight-1)/normWeight). E.g. for
// max-1 objects on the outer ring.
}
}
return probability;
}
A simple approach could be:
1 / (sum over the number of all neighbours in [min, max] weighted by their distance to x/y + 1).
By weighted I mean that the number of those neighbours whose distance to x/y is smaller is multiplied by a bigger factor that the number of those, that are not so close. As weight you could for example take (max+1)-distance.
Note that once you compute the object density (see "population density" or "weighted sum of those objects in a distance" in previous answers), you still need to transform this value to a probability in which to insert new objects (which is not treated so comprehensivelly in other answers).
The probability function (PDF) needs to be defined for all possible values of object density, i.e. on closed interval [0, 1], but otherwise it can be shaped towards any goal you desire (see illustrations), e.g.:
to move the current object density towards a desired maximum object density
to keep overall probability of insertion constant, while taking the local object density into account
If you want to experiment with various goals (PDF function shapes - linear, quadratic, hyperbole, circle section, ...), you might wish to have a look at factory method pattern so you can switch between implementations while calling the same method name, but I wanted to keep things simpler in my example, so I implemented only the 1st goal (in python):
def object_density(objects, min, max):
# choose your favourite algorithm, e.g.:
# Compute the density for each level of distance
# and then averages the levels, i.e. distance 2 object is
# exactly 8 times less significant from distance 1 object.
# Returns float between 0 and 1 (inclusive) for valid inputs.
levels = [objects[d] / (d * 8) for d in range(min, max + 1)]
return sum(levels) / len(levels)
def probability_from_density(desired_max_density, density):
# play with PDF functions, e.g.
# a simple linear function
# f(x) = a*x + b
# where we know 2 points [0, 1] and [desired_max_density, 0], so:
# 1 = 0 + b
# 0 = a*desired_max_density + b
# Returns float betwen 0 and 1 (inclusive) for valid inputs.
if density >= desired_max_density:
return 0.0
a = -1 / desired_max_density
b = 1
return a * density + b
def main():
# distance 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
objects = [0, 0, 1, 2, 0, 3, 1, 0, 4, 0, 1]
min = 2
max = 5
desired_max_density = 0.1
if sum(objects[:min]): # when an object is below min distance
return 0.0
density = object_density(objects, min, max) # 0,0552
probability = probability_from_density(desired_max_density, density) # 0,4479
return probability
print(main())

Algorithm for fitting points to a grid

I have a list of points in 2D space that form an (imperfect) grid:
x x x x
x x x x
x
x x x
x x x x
What's the best way to fit these to a rigid grid (i.e. create a two-dimendional array and work out where each point fits in that array)?
There are no holes in the grid, but I don't know in advance what its dimensions are.
EDIT: The grid is not necessarily regular (not even spacing between rows/cols)
A little bit of an image processing approach:
If you think of what you have as a binary image where the X is 1 and the rest is 0, you can sum up rows and columns, and use a peak finding algorithm to identify peaks which would correspond to x and y lines of the grid:
Your points as a binary image:
Sums of row/columns
Now apply some smoothing technique to the signal (e.g. lowess):
I'm sure you get the idea :-)
Good luck
The best I could come up with is a brute-force solution that calculates the grid dimensions that minimize the error in the square of the Euclidean distance between the point and its nearest grid intersection.
This assumes that the number of points p is exactly equal to the number of columns times the number of rows, and that each grid intersection has exactly one point on it. It also assumes that the minimum x/y value for any point is zero. If the minimum is greater than zero, just subtract the minimum x value from each point's x coordinate and the minimum y value from each point's y coordinate.
The idea is to create all of the possible grid dimensions given the number of points. In the example above with 16 points, we would make grids with dimensions 1x16, 2x8, 4x4, 8x2 and 16x1. For each of these grids we calculate where the grid intersections would lie by dividing the maximum width of the points by the number of columns minus 1, and the maximum height of the points by the number of rows minus 1. Then we fit each point to its closest grid intersection and find the error (square of the distance) between the point and the intersection. (Note that this only works if each point is closer to its intended grid intersection than to any other intersection.)
After summing the errors for each grid configuration individually (e.g. getting one error value for the 1x16 configuration, another for the 2x8 configuration and so on), we select the configuration with the lowest error.
Initialization:
P is the set of points such that P[i][0] is the x-coordinate and
P[i][1] is the y-coordinate
Let p = |P| or the number of points in P
Let max_x = the maximum x-coordinate in P
Let max_y = the maximum y-coordinate in P
(minimum values are assumed to be zero)
Initialize min_error_dist = +infinity
Initialize min_error_cols = -1
Algorithm:
for (col_count = 1; col_count <= n; col_count++) {
// only compute for integer # of rows and cols
if ((p % col_count) == 0) {
row_count = n/col_count;
// Compute the width of the columns and height of the rows
// If the number of columns is 1, let the column width be max_x
// (and similarly for rows)
if (col_count > 1) col_width = max_x/(col_count-1);
else col_width=max_x;
if (row_count > 1) row_height = max_y/(row_count-1);
else row_height=max_y;
// reset the error for the new configuration
error_dist = 0.0;
for (i = 0; i < n; i++) {
// For the current point, normalize the x- and y-coordinates
// so that it's in the range 0..(col_count-1)
// and 0..(row_count-1)
normalized_x = P[i][0]/col_width;
normalized_y = P[i][1]/row_height;
// Error is the sum of the squares of the distances between
// the current point and the nearest grid point
// (in both the x and y direction)
error_dist += (normalized_x - round(normalized_x))^2 +
(normalized_y - round(normalized_y))^2;
}
if (error_dist < min_error_dist) {
min_error_dist = error_dist;
min_error_cols = col_count;
}
}
}
return min_error_cols;
Once you've got the number of columns (and thus the number of rows) you can recompute the normalized values for each point and round them to get the grid intersection they belong to.
In the end I used this algorithm, inspired by beaker's:
Calculate all the possible dimensions of the grid, given the total number of points
For each possible dimension, fit the points to that dimension and calculate the variance in alignment:
Order the points by x-value
Group the points into columns: the first r points form the first column, where r is the number of rows
Within each column, order the points by y-value to determine which row they're in
For each row/column, calcuate the range of y-values/x-values
The variance in alignment is the maximum range found
Choose the dimension with the least variance in alignment
I wrote this algorithm that accounts for missing coordinates as well as coordinates with errors.
Python Code
# Input [x, y] coordinates of a 'sparse' grid with errors
xys = [[103,101],
[198,103],
[300, 99],
[ 97,205],
[304,202],
[102,295],
[200,303],
[104,405],
[205,394],
[298,401]]
def row_col_avgs(num_list, ratio):
# Finds the average of each row and column. Coordinates are
# assigned to a row and column by specifying an error ratio.
last_num = 0
sum_nums = 0
count_nums = 0
avgs = []
num_list.sort()
for num in num_list:
if num > (1 + ratio) * last_num and count_nums != 0:
avgs.append(int(round(sum_nums/count_nums,0)))
sum_nums = num
count_nums = 1
else:
sum_nums = sum_nums + num
count_nums = count_nums + 1
last_num = num
avgs.append(int(round(sum_nums/count_nums,0)))
return avgs
# Split coordinates into two lists of x's and y's
xs, ys = map(list, zip(*xys))
# Find averages of each row and column within a specified error.
x_avgs = row_col_avgs(xs, 0.1)
y_avgs = row_col_avgs(ys, 0.1)
# Return Completed Averaged Grid
avg_grid = []
for y_avg in y_avgs:
avg_row = []
for x_avg in x_avgs:
avg_row.append([int(x_avg), int(y_avg)])
avg_grid.append(avg_row)
print(avg_grid)
Code Output
[[[102, 101], [201, 101], [301, 101]],
[[102, 204], [201, 204], [301, 204]],
[[102, 299], [201, 299], [301, 299]],
[[102, 400], [201, 400], [301, 400]]]
I am also looking for another solution using linear algebra. See my question here.

Resources