Function using multiprocessing returns None values - multiprocessing

I have been stuck on this for quite a while now. I am using the multiprocessing function to speed up a function that previously looped over data points. Before using multiprocessing the function worked fine, but now it returns some none values (the first few, <10) before returning values. I have tried many things and different ways to use the multiprocessing pool.
The multiprocessing is used inside a function, which I am not sure if that might be the problem?
def SkyViewFactor(point, coords, max_radius):
betas_lin = np.linspace(0,2*np.pi,steps_beta)
"""this is the analytical dome area but should make same assumption as for d_area"""
dome_area = max_radius**2*2*np.pi
""" we throw away all point outside the dome
# dome is now a 5 column array of points:
# the 5 columns: x,y,z,radius,angle theta"""
dome_p = dome(point, coords, max_radius)
betas = np.zeros(steps_beta)
"""we loop over all points in the dome"""
d = 0
while (d < dome_p.shape[0]):
psi = np.arctan((dome_p[d,2]-point[2])/dome_p[d,3])
"""The angles of the min and max angle of the building"""
beta_min = - np.arcsin(np.sqrt(2*gridboxsize**2)/2/dome_p[d,3]) + dome_p[d,4]
beta_max = np.arcsin(np.sqrt(2*gridboxsize**2)/2/dome_p[d,3]) + dome_p[d,4]
"""Where the index of betas fall within the min and max beta, and there is not already a larger psi blocking"""
betas[np.nonzero(np.logical_and((betas < psi), np.logical_and((beta_min <= betas_lin), (betas_lin < beta_max))))] = psi
d +=1
areas = d_area(betas, steps_beta, max_radius)
"""The SVF is the fraction of area of the dome that is not blocked"""
SVF = np.around((dome_area - np.sum(areas))/dome_area, 3)
#print(SVF)
return SVF
def calc_SVF(coords, max_radius, blocklength):
"""
Function to calculate the sky view factor.
We create a dome around a point with a certain radius,
and for each point in the dome we evaluate of the height of this point blocks the view
:param coords: all coordinates of our dataset
:param max_radius: maximum radius we think influences the svf
:param blocklength: the first amount of points in our data set we want to evaluate
:return: SVF for all points
"""
def parallel_runs_SVF():
points = [coords[i,:] for i in range(blocklength)]
pool = Pool()
SVF_list = []
SVF_par = partial(SkyViewFactor, coords=coords,max_radius=max_radius) # prod_x has only one argument x (y is fixed to 10)
SVF = pool.map(SVF_par, points)
pool.close()
pool.join()
# if SVF != None:
# SVF_list.append(SVF)
print(SVF)
return SVF
if __name__ == '__SVF__':
return parallel_runs_SVF()
This function is later called in:
def reshape_SVF(data,coords,julianday,lat,long,LMT,reshape,save_CSV,save_Im):
[x_len, y_len] = [int(data.shape[0]/2),int(data.shape[1]/2)]
blocklength = int(x_len*y_len)
"Compute SVF and SF and Reshape the shadow factors and SVF back to nd array"
SVFs = calc_SVF(coords,max_radius,blocklength)
SFs = calc_SF(coords,julianday,lat,long,LMT,blocklength)
#SVFs = filter(None, SVFs)
"If reshape is true we reshape the arrays to the original data matrix"
if reshape == True:
SVF_matrix = np.ndarray([x_len,y_len])
SF_matrix = np.ndarray([x_len,y_len])
for i in range(blocklength):
SVF_matrix[coords[int(i-x_len/2),0],coords[int(i-y_len/2),1]] = SVFs[i]
SF_matrix[coords[int(i-x_len/2),0],coords[int(i-y_len/2),1]] = SFs[i]
if save_CSV == True:
np.savetxt("SVFmatrix.csv", SVF_matrix, delimiter=",")
np.savetxt("SFmatrix.csv", SF_matrix, delimiter=",")
if save_Im == True:
tf.imwrite('SVF_matrix.tif', SVF_matrix, photometric='minisblack')
tf.imwrite('SF_matrix.tif', SF_matrix, photometric='minisblack')
return SF_matrix,SF_matrix
elif reshape == False:
np.savetxt("SVFs.csv", SVFs, delimiter=",")
np.savetxt("SFs.csv", SFs, delimiter=",")
return SVFs, SFs
SFs is a similar function with the same structure (it also uses multiprocessing). The goal is to return a list with all Sky View factors, or if reshape is true a matrix with the same shape as the original input data (DSM data) with the sky view factor for each location.
I get the error:
SVF_matrix[coords[int(i-x_len/2),0],coords[int(i-y_len/2),1]] = SVFs[i]
TypeError: 'NoneType' object is not subscriptable
I tried to filter out nones with the filter() function using
SVFs = filter(None, SVFs)
This returns the error
TypeError: 'NoneType' object is not iterable. Also I do not know if the None values returned are instead of the actual values or extra (i.e. if I have 1000 datapoints I should have 1000 sky view factors, but do I get an array with 3 Nones and then numbers, do I get 3 Nones ánd 1000 sky view factors or do I get a list of 1000 values of which the first 3 are Nones?)
I also tried to Make an empty list and append the SVF only if this is not None, this however also does not work
SVF_list = []
if SVF != None:
SVF_list.append(SVF)
return SVF_list
To include all used functions: These are the functions used in SkyViewFactor to calculate distances and the area elements
def dist(point, coord):
"""
:param point: evaluation point (x,y,z)
:param coord: array of coordinates with heights
:return: the distance from each coordinate to the point and the angle
"""
# Columns is dx
dx = (coord[:,1]-point[1])*gridboxsize
# Rows is dy
dy = (coord[:,0]-point[0])*gridboxsize
dist = np.sqrt(abs(dx)**2 + abs(dy)**2)
"""angle is 0 north direction"""
angle = np.arctan2(dy,dx)+np.pi/2
return dist,angle
def dome(point, coords, maxR):
"""
:param point: point we are evaluating
:param coords: array of coordinates with heights
:param maxR: maximum radius in which we think the coordinates can influence the SVF
:return: a dome of points that we take into account to evaluate the SVF
"""
radii, angles = dist(point,coords)
coords = np.column_stack([coords, radii])
coords = np.column_stack([coords, angles])
"""the dome consist of points higher than the view height and within the radius we want"""
dome = coords[(np.logical_and(coords[:,3]<maxR,coords[:,3]>0.1)),:]
dome = dome[(dome[:,2]>point[2]),:]
return dome
def d_area(psi,steps_beta,maxR):
"""Radius at ground surface and at the height of the projection of the building"""
d_area = 2*np.pi/steps_beta*maxR**2*np.sin(psi)
return d_area
Help with this or some other suggestion to speed up my code / use multiprocessing the right way is very appreciated!
I am trying to speed up a for loop using multiprocessing, this works without multiprocessing but since I have to iterate over 1.250.000 datapoints it is way to slow. With multiprocessing it returns None for the first few values.

Related

Form a rectangle from plane points

I have a set of red lines from which I get a set of green intersection points (visible on the screen):
Then I want to find the four points that most likely describe the rectangle (if there are several options, then choose the largest area). I read similar questions about how to find points that EXACTLY form a rectangle:
find if 4 points on a plane form a rectangle?
https://softwareengineering.stackexchange.com/questions/176938/how-to-check-if-4-points-form-a-square
There is an option to iterate over all four points and calculate the probability that they form a rectangle (or some coefficient of similarity to a rectangle). Suppose at the moment we are considering four points A, B, C, D. I tried 2 similarity functions:
,
where <> denotes dot product, and || - vector norm.
,
where std is the standard deviation of the distances from the vertices to the center of mass of the assumed rectangle, and mean is the average distance.
Both functions did not perform well.
Is there a way to introduce a function that is close to 1 when the four points of the plane are close to the vertices of the rectangle and equal to 0 when they are at the position farthest from the rectangle (assuming they are on 1 line)?
I can't really speak to finding an appropriate cost function for scoring what a "good" rectangle is. From the comments it looks like there's a lot of discussion, but no consensus. So for now I'm going to just use a scoring function that penalizes four-point shapes for having angles that are further away from 90 degrees. Specifically, I'm summing the squared distance. If you want to have a different scoring metric you can replace the calculation in the scoreFunc function.
I set up an interactive window where you can click to add points. When you press 'q' it'll take those points, find all possible combinations (not permutations) of 4 points, and then run the scoring function on each and draws the best.
I'm using a recursive, brute-force search. To avoid having a ton of duplicates I came up with a hashing function that works regardless of order. I used prime numbers to ID each point and the hashing function just takes the product of the ID's of the points. This ensures that (1,3,5,7) is the same as (3,1,7,5). I used primes because the product of primes is unique in this situation (they can't be factorized and clumped because they're primes).
After the search I have to make sure that the points are ordered in such a way that the lines aren't intersecting. I'm taking advantage of OpenCV's contourArea to do that calculation for me. I can swap the first point with it's horizontal and vertical neighbor and compare the areas to the original. "Bowtie" shapes from intersecting lines will have less area (I'm pretty sure they actually get zero area because they don't count as closed shapes) than a non-intersection shape.
import cv2
import numpy as np
import math
# get mouse click
click_pos = None;
click = False;
def mouseClick(event, x, y, flags, param):
# hook to globals
global click_pos;
global click;
# check for left mouseclick
if event == cv2.EVENT_LBUTTONDOWN:
click = True;
click_pos = (x,y);
# prime hash function
def phash(points):
total = 1;
for point in points:
total *= point[0];
return total;
# checks if an id is already present in list
def isInList(point, curr_list):
pid = point[0];
for item in curr_list:
if item[0] == pid:
return True;
return False;
# look for rectangles
def getAllRects(points, curr_list, rects, curr_point):
# check if already in curr_list
if isInList(curr_point, curr_list):
return curr_list;
# add self to list
curr_list.append(curr_point);
# check end condition
if len(curr_list) == 4:
# add to dictionary (no worry for duplicates)
rects[phash(curr_list)] = curr_list[:];
curr_list = curr_list[:-1];
return curr_list;
# continue search
for point in points:
curr_list = getAllRects(points, curr_list, rects, point);
curr_list = curr_list[:-1];
return curr_list;
# checks if a number is prime
def isPrime(num):
bound = int(math.sqrt(num));
curr = 3;
while curr <= bound:
if num % curr == 0:
return False;
# skip evens
curr += 2;
return True;
# generate prime number id's for each point
def genPrimes(num):
primes = [];
curr = 1;
while len(primes) < num:
if isPrime(curr):
primes.append(curr);
# +2 to skip evens
curr += 2;
return primes;
# swap sides (fix intersecting lines issue)
def swapH(box):
new_box = np.copy(box);
new_box[0] = box[1];
new_box[1] = box[0];
return new_box;
def swapV(box):
new_box = np.copy(box);
new_box[0] = box[3];
new_box[3] = box[0];
return new_box;
# removes intersections
def noNoodles(box):
# get three variants
hbox = swapH(box);
vbox = swapV(box);
# get areas and choose max
sortable = [];
sortable.append([cv2.contourArea(box), box]);
sortable.append([cv2.contourArea(hbox), hbox]);
sortable.append([cv2.contourArea(vbox), vbox]);
sortable.sort(key = lambda a : a[0]);
return sortable[-1][1];
# 2d distance
def dist2D(one, two):
dx = one[0] - two[0];
dy = one[1] - two[1];
return math.sqrt(dx*dx + dy*dy);
# angle between three points (the last point is the middle)
# law of cosines
def angle3P(p1, p2, p3):
# get distances
a = dist2D(p3, p1);
b = dist2D(p3, p2);
c = dist2D(p1, p2);
# calculate angle // assume a and b are nonzero
numer = c**2 - a**2 - b**2;
denom = -2 * a * b;
if denom == 0:
denom = 0.000001;
rads = math.acos(numer / denom);
degs = math.degrees(rads);
return degs;
# calculates a score
def scoreFunc(box):
# for each point, calculate angle
angles = [];
for a in range(len(box)):
prev = box[a-2][0];
curr = box[a-1][0];
next = box[a][0];
angles.append(angle3P(prev, next, curr));
# for each angle, score on squared distance from 90
score = 0;
for angle in angles:
score += (angle - 90)**2;
return score;
# evaluates each box (assigns a score)
def evaluate(boxes):
sortable = [];
for box in boxes:
# INSERT YOUR OWN SCORING FUNC HERE
sortable.append([scoreFunc(box), box]);
sortable.sort(key = lambda a : a[0]);
return sortable;
# set up callback
cv2.namedWindow("Display");
cv2.setMouseCallback("Display", mouseClick);
# set up screen
res = (600,600,3);
bg = np.zeros(res, np.uint8);
# loop
done = False;
points = [];
while not done:
# reset display
display = np.copy(bg);
# check for new click
if click:
click = False;
points.append(click_pos);
# draw points
for point in points:
cv2.circle(display, point, 4, (0,200,0), -1);
# show
cv2.imshow("Display", display);
key = cv2.waitKey(1);
# check keypresses
done = key == ord('q');
# generate prime number id's for each point
# if you have a lot of points, it would be worth it
# to just have a .txt file with a bunch of pre-gen primes in it
primes = genPrimes(len(points));
print(primes);
withPrimes = [];
for a in range(len(points)):
withPrimes.append([primes[a], points[a]]);
# run brute-force search over all points
rects = {};
for a in range(len(withPrimes)):
getAllRects(withPrimes, [], rects, withPrimes[a]);
print(len(rects));
# extract just the points (don't need the prime id's anymore)
boxes = [];
for key in rects:
box = [];
for item in rects[key]:
box.append([item[1]]);
boxes.append(np.array(box));
# go through all of the boxes and un-intersect their sides
for a in range(len(boxes)):
boxes[a] = noNoodles(boxes[a]);
# draw each one to check for noodles
# for box in boxes:
# blank = np.zeros_like(bg, np.uint8);
# cv2.drawContours(blank, [box], -1, (255,255,255), -1);
# cv2.imshow("Box", blank);
# cv2.waitKey(0);
# noodles have been squared get best box
sortedBoxes = evaluate(boxes);
bestBox = sortedBoxes[0][1];
# draw
blank = np.zeros_like(bg, np.uint8);
cv2.drawContours(blank, [bestBox], -1, (255,255,255), -1);
for point in points:
cv2.circle(blank, point, 4, (0,200,0), -1);
cv2.imshow("Best", blank);
cv2.waitKey(0);

Finding nearest station to each shop using BallTree

I've got 2 datasets, a list of shops with UK coordinates and train station also, with coordinates.
I'm using BallTree to get the nearest station to each shop with a distance, using a a code from this website and I've swapped in my dataframes appropriately.
https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html
Code:
import pandas as pd
import numpy as np
import geopandas as gpd
from sklearn.neighbors import BallTree
df_pocs = pd.read_csv(r'C:\Users\FLETCHWI\Desktop\XX\shops.csv', encoding = "ISO-8859-1", engine='python')
df_stations = pd.read_csv(r'C:\Users\FLETCHWI\Desktop\xx\uk_stations.csv', encoding = "ISO-8859-1", engine='python')
gdf_pocs = gpd.GeoDataFrame(
df_pocs, geometry=gpd.points_from_xy(df_pocs.longitude, df_pocs.latitude))
gdf_stations = gpd.GeoDataFrame(
df_stations, geometry=gpd.points_from_xy(df_stations.longitude, df_stations.latitude))
def get_nearest(src_points, candidates, k_neighbors=1):
"""Find nearest neighbors for all source points from a set of candidate points"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
# Return indices and distances
return (closest, closest_dist)
def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
"""
For each point in left_gdf, find closest point in right GeoDataFrame and return them.
NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
"""
left_geom_col = left_gdf.geometry.name
right_geom_col = right_gdf.geometry.name
# Ensure that index in right gdf is formed of sequential numbers
right = right_gdf.copy().reset_index(drop=True)
# Parse coordinates from points and insert them into a numpy array as RADIANS
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
# Find the nearest points
# -----------------------
# closest ==> index in right_gdf that corresponds to the closest point
# dist ==> distance between the nearest neighbors (in meters)
closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)
# Return points from right GeoDataFrame that are closest to points in left GeoDataFrame
closest_points = right.loc[closest]
# Ensure that the index corresponds the one in left_gdf
closest_points = closest_points.reset_index(drop=True)
# Add distance if requested
if return_dist:
# Convert to meters from radians
earth_radius = 6371000 # meters
closest_points['distance'] = dist * earth_radius
return closest_points
# Find closest public transport stop for each building and get also the distance based on haversine distance
# Note: haversine distance which is implemented here is a bit slower than using e.g. 'euclidean' metric
# but useful as we get the distance between points in meters
closest_stations = nearest_neighbor(gdf_pocs, gdf_stations, return_dist=True)
Upon running the code, it returns the same station for every shop that I have. However I'd like it to find the nearest station for every shop and the distance to it.
Any help appreciated, thanks!
I did some testing of the functions and indeed lat/long needs to be reversed for it to work.
Notice the warning:
NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
Hence, when defining the point simple change
gdf_pocs = gpd.GeoDataFrame(
df_pocs, geometry=gpd.points_from_xy(df_pocs.longitude, df_pocs.latitude))
gdf_stations = gpd.GeoDataFrame(
df_stations, geometry=gpd.points_from_xy(df_stations.longitude, df_stations.latitude))
to
gdf_pocs = gpd.GeoDataFrame(
df_pocs, geometry=gpd.points_from_xy(df_pocs.latitude, df_pocs.longitude))
gdf_stations = gpd.GeoDataFrame(
df_stations, geometry=gpd.points_from_xy(df_stations.latitude, df_stations.longitude))

How to crop out a detected object (circle) from image and store it?

The code below classifies objects based on their roundness using bwboundaries.
It estimates each object's area and perimeter and uses these results to form a simple metric indicating the roundness of an object with the following metric:
metric = 4*pi*area/perimeter^2
This metric is equal to 1 only for a circle and it is less than one for any other shape. But with this code, I'm using a threshold of 0.80 so that only objects with a metric value greater than 0.80 will be classified as round.
My question is when a given object classified as a round, how can I crop it out from the original image img (not I nor bw) and save it as a new image?
I think using the label matrix and boundary matrix would be enough to do so, but still don't know how to manipulate them.
img=imread('cap.png');
I = rgb2gray(img);
% Step 2: Threshold the Image
bw1 = imbinarize(I);
bw = imcomplement(bw1);
% Step 3: Remove the Noise
bw = bwareaopen(bw,30); % remove small objects
bw = imfill(bw,'holes');
% Step 4: Find the Boundaries
[B,L] = bwboundaries(bw,'noholes');
imshow(label2rgb(L,#jet,[.5 .5 .5]))
hold on
for k = 1:length(B)
boundary = B{k};
plot(boundary(:,2),boundary(:,1),'w','LineWidth',2)
end
% Step 5: Determine which Objects are Round
stats = regionprops(L,'Area','Centroid');
threshold = 0.80;
% loop over the boundaries
for k = 1:length(B)
% obtain (X,Y) boundary coordinates corresponding to label 'k'
boundary = B{k};
% compute a simple estimate of the object's perimeter
delta_sq = diff(boundary).^2;
perimeter = sum(sqrt(sum(delta_sq,2)));
% obtain the area calculation corresponding to label 'k'
area = stats(k).Area;
% compute the roundness metric
metric = 4*pi*area/perimeter^2;
% display the results
metric_string = sprintf('%2.2f',metric);
% Test if the current object classified as a round
if metric > threshold
% HERE, I want to crop the current object from the 'img'
% and save it as a new image
end
end
title(['Metrics closer to 1 indicate that ',...
'the object is approximately round'])
You can additionally add the BoundingBox attribute to regionprops which will effectively give you the limits of where the blob extends within a bounding box, and you can use those to crop your image and save it. It will have the form [x y width height] where x and y are the top left coordinates of the bounding box and width and height are of course the width and height. x would be the column coordinate and y would be the row coordinate. You can use imcrop to finally crop out the image.
img=imread('cap.png');
I = rgb2gray(img);
% Step 2: Threshold the Image
bw1 = imbinarize(I);
bw = imcomplement(bw1);
% Step 3: Remove the Noise
bw = bwareaopen(bw,30); % remove small objects
bw = imfill(bw,'holes');
% Step 4: Find the Boundaries
[B,L] = bwboundaries(bw,'noholes');
imshow(label2rgb(L,#jet,[.5 .5 .5]))
hold on
for k = 1:length(B)
boundary = B{k};
plot(boundary(:,2),boundary(:,1),'w','LineWidth',2)
end
% Step 5: Determine which Objects are Round
stats = regionprops(L,'Area','Centroid','BoundingBox'); % Change
threshold = 0.80;
% loop over the boundaries
for k = 1:length(B)
% obtain (X,Y) boundary coordinates corresponding to label 'k'
boundary = B{k};
% compute a simple estimate of the object's perimeter
delta_sq = diff(boundary).^2;
perimeter = sum(sqrt(sum(delta_sq,2)));
% obtain the area calculation corresponding to label 'k'
area = stats(k).Area;
% compute the roundness metric
metric = 4*pi*area/perimeter^2;
% display the results
metric_string = sprintf('%2.2f',metric);
% Test if the current object classified as a round
if metric > threshold
% HERE, I want to crop the current object from the 'img'
% and save it as a new image
% New - crop image
bb = stats(k).BoundingBox;
img_crop = imcrop(img, bb);
% New - Save the image
imwrite(img_crop, sprintf('crop%d.png', k));
end
end
title(['Metrics closer to 1 indicate that ',...
'the object is approximately round'])
Note that I use imwrite to save the crop to file and it's named based on what blob ID you're looking at. Therefore, if there are multiple blobs or round objects that satisfy the criteria, you will be saving them all.

Calculating light intensity in a closed world

I am building a simulation in which there is world made of many squares. There are also objects designated as "suns", which illuminate the squares and update their "received intensity" each step.
For example:
In this image, the distance between the centres of squares is 32 units, and the received intensity (or R.I.) of each square is calculated with this formula:
R.I. = 100 / (distance between light source and square)^2
Written in a generic(and terrible) programming language, a function that calculates R.I. may look like this:
(define (calc-RI sq-x sq-y sn-x sn-y)
(return (/ 100
(+ (square (- sq-x sn-x))
(square (- sq-y sn-y))
)
)
)
)
...and, to accommodate multiple suns, the R.I. for each sun will be calculated separately and added together for each square. This is all well and good, until I felt the need to introduce a warping mechanic to this simulation: objects that move "outside" the edge of the world will "re-enter" the world from the other side, like in the game Asteroid.
This not only need to apply to other objects in the simulation, but also light.
So, what should the new function be for calculating the R.I. of each square? (Given that the function takes in the x and y coordinate of one square and one sun, and the width and height of the world, what would be the return of the given square's R.I. under the sun's influence?)
You can solve this by imagining that the grid is surrounded by copies of the original grid, one layer for each wrap. For each square in the original grid, calculate the light intensity that falls on each corresponding square, and sum the results.
In the above diagram, the black grid represents the world, and the blue grid represents the copies of the world grid. To find the light intensity for one wrap at the green square, add the simple intensities calculated at each of the blue squares to the simple intensity calculated for the green square. To calculate another wrap, add another layer of copies.
Here is some Lua code that implements this. Stars are represented in tables of the form:
stars = { { x=x1, y=y1 }, { x=x2, y=y2 },... }
In this implementation, a square containing a star is given an intensity of -1. This could be changed so that a square containing a star accumulates intensity like any other square. To use the functions, define a stars table, and call light_universe() with the grid_size (grids are taken to be square here), square_size, stars table, and number of wraps desired. Set wraps to zero, nil, or just omit this parameter, to obtain simple intensity calculations with no wrapping. The function light_universe() returns a table containing intensities for each square of the world grid. You can view the results by calling show_intensities() with the table returned from light_universe(), and the grid_size of the table. Note that the upper-left corner of the world grid has coordinate (1, 1).
The calculate_intensity() function first calculates the field_size of the field of grid copies. This is the size of the blue grid in the diagram, and is an odd multiple of grid_size, e.g., for 1 wrap the field_size is 3. Next, the world coordinates of the current star are transformed to the field coordinates star_x and star_y (the coordinates with respect to the blue grid in the diagram). Then the locations within the field corresponding to x and y are iterated over. These are the locations of the blue squares of the diagram. The first location is in the upper-left grid of the field, and has field coordinates that are equal to the world coordinates of the square of interest. For each location, the intensity is calculated and added to the running total, which is returned when the iteration is complete.
-- Intensity calculation with wraps
-- wraps = nil or wraps = 0 for simple calculation
function calculate_intensity(star, x, y, grid_size, square_size, wraps)
wraps = wraps or 0
local field_size = grid_size * (2 * wraps + 1) -- odd number of grids
local star_x = star.x + wraps * grid_size -- cdts of star in field
local star_y = star.y + wraps * grid_size
local intensity = 0
-- x and y are cdts wrt world grid, but also wrt first grid in field
for field_y = y, field_size, grid_size do
for field_x = x, field_size, grid_size do
local dx = square_size * (star_x - field_x)
local dy = square_size * (star_y - field_y)
local dist_sq = dx * dx + dy * dy
intensity = intensity + 100 / dist_sq
end
end
return intensity
end
function light_universe(grid_size, square_size, stars, wraps)
wraps = wraps or 0
local grid_intensities = {}
for i, star in ipairs(stars) do
for y = 1, grid_size do
grid_intensities[y] = grid_intensities[y] or {}
for x = 1, grid_size do
if x == star.x and y == star.y then
grid_intensities[y][x] = -1
elseif grid_intensities[y][x] ~= -1 then
grid_intensities[y][x] = (grid_intensities[y][x] or 0) +
calculate_intensity(star, x, y, grid_size, square_size, wraps)
end
end
end
end
return grid_intensities
end
function show_intensities(grid, grid_size)
for y = 1, grid_size do
for x = 1, grid_size do
local intensity = grid[y][x]
local fmt
if intensity ~= -1 then
fmt = (string.format("%10.5f", intensity))
else
fmt = string.format("%-10s", " Star")
end
io.write(fmt)
end
print()
end
end
Here is an interaction showing intensity with no wraps, corresponding to the example from your question.
> stars = { { x=1, y=1 } }
> light_grid = light_universe(3, 32, stars, 0)
> show_intensities(light_grid, 3)
Star 0.09766 0.02441
0.09766 0.04883 0.01953
0.02441 0.01953 0.01221
Here is the same situation with one wrap:
> light_grid = light_universe(3, 32, stars, 1)
> show_intensities(light_grid, 3)
Star 0.17054 0.16628
0.17054 0.12440 0.12023
0.16628 0.12023 0.11630
And with two wraps:
> light_grid = light_universe(3, 32, stars, 2)
> show_intensities(light_grid, 3)
Star 0.20497 0.20347
0.20497 0.15960 0.15811
0.20347 0.15811 0.15664
Here is a 7X7 grid with two stars, and one wrap:
> stars = { { x=1, y=1 }, { x=7, y=4 } }
> light_grid = light_universe(7, 32, stars, 1)
> show_intensities(light_grid, 7)
Star 0.13085 0.05729 0.04587 0.04728 0.06073 0.13366
0.14064 0.08582 0.05424 0.04640 0.04971 0.06411 0.09676
0.09676 0.06411 0.04971 0.04640 0.05424 0.08582 0.14064
0.13366 0.06073 0.04728 0.04587 0.05729 0.13085 Star
0.08469 0.05562 0.04574 0.04433 0.05218 0.08190 0.13222
0.06715 0.05619 0.04631 0.04302 0.04635 0.05627 0.06728
0.13073 0.08043 0.05075 0.04294 0.04439 0.05432 0.08347

Algorithm for fitting points to a grid

I have a list of points in 2D space that form an (imperfect) grid:
x x x x
x x x x
x
x x x
x x x x
What's the best way to fit these to a rigid grid (i.e. create a two-dimendional array and work out where each point fits in that array)?
There are no holes in the grid, but I don't know in advance what its dimensions are.
EDIT: The grid is not necessarily regular (not even spacing between rows/cols)
A little bit of an image processing approach:
If you think of what you have as a binary image where the X is 1 and the rest is 0, you can sum up rows and columns, and use a peak finding algorithm to identify peaks which would correspond to x and y lines of the grid:
Your points as a binary image:
Sums of row/columns
Now apply some smoothing technique to the signal (e.g. lowess):
I'm sure you get the idea :-)
Good luck
The best I could come up with is a brute-force solution that calculates the grid dimensions that minimize the error in the square of the Euclidean distance between the point and its nearest grid intersection.
This assumes that the number of points p is exactly equal to the number of columns times the number of rows, and that each grid intersection has exactly one point on it. It also assumes that the minimum x/y value for any point is zero. If the minimum is greater than zero, just subtract the minimum x value from each point's x coordinate and the minimum y value from each point's y coordinate.
The idea is to create all of the possible grid dimensions given the number of points. In the example above with 16 points, we would make grids with dimensions 1x16, 2x8, 4x4, 8x2 and 16x1. For each of these grids we calculate where the grid intersections would lie by dividing the maximum width of the points by the number of columns minus 1, and the maximum height of the points by the number of rows minus 1. Then we fit each point to its closest grid intersection and find the error (square of the distance) between the point and the intersection. (Note that this only works if each point is closer to its intended grid intersection than to any other intersection.)
After summing the errors for each grid configuration individually (e.g. getting one error value for the 1x16 configuration, another for the 2x8 configuration and so on), we select the configuration with the lowest error.
Initialization:
P is the set of points such that P[i][0] is the x-coordinate and
P[i][1] is the y-coordinate
Let p = |P| or the number of points in P
Let max_x = the maximum x-coordinate in P
Let max_y = the maximum y-coordinate in P
(minimum values are assumed to be zero)
Initialize min_error_dist = +infinity
Initialize min_error_cols = -1
Algorithm:
for (col_count = 1; col_count <= n; col_count++) {
// only compute for integer # of rows and cols
if ((p % col_count) == 0) {
row_count = n/col_count;
// Compute the width of the columns and height of the rows
// If the number of columns is 1, let the column width be max_x
// (and similarly for rows)
if (col_count > 1) col_width = max_x/(col_count-1);
else col_width=max_x;
if (row_count > 1) row_height = max_y/(row_count-1);
else row_height=max_y;
// reset the error for the new configuration
error_dist = 0.0;
for (i = 0; i < n; i++) {
// For the current point, normalize the x- and y-coordinates
// so that it's in the range 0..(col_count-1)
// and 0..(row_count-1)
normalized_x = P[i][0]/col_width;
normalized_y = P[i][1]/row_height;
// Error is the sum of the squares of the distances between
// the current point and the nearest grid point
// (in both the x and y direction)
error_dist += (normalized_x - round(normalized_x))^2 +
(normalized_y - round(normalized_y))^2;
}
if (error_dist < min_error_dist) {
min_error_dist = error_dist;
min_error_cols = col_count;
}
}
}
return min_error_cols;
Once you've got the number of columns (and thus the number of rows) you can recompute the normalized values for each point and round them to get the grid intersection they belong to.
In the end I used this algorithm, inspired by beaker's:
Calculate all the possible dimensions of the grid, given the total number of points
For each possible dimension, fit the points to that dimension and calculate the variance in alignment:
Order the points by x-value
Group the points into columns: the first r points form the first column, where r is the number of rows
Within each column, order the points by y-value to determine which row they're in
For each row/column, calcuate the range of y-values/x-values
The variance in alignment is the maximum range found
Choose the dimension with the least variance in alignment
I wrote this algorithm that accounts for missing coordinates as well as coordinates with errors.
Python Code
# Input [x, y] coordinates of a 'sparse' grid with errors
xys = [[103,101],
[198,103],
[300, 99],
[ 97,205],
[304,202],
[102,295],
[200,303],
[104,405],
[205,394],
[298,401]]
def row_col_avgs(num_list, ratio):
# Finds the average of each row and column. Coordinates are
# assigned to a row and column by specifying an error ratio.
last_num = 0
sum_nums = 0
count_nums = 0
avgs = []
num_list.sort()
for num in num_list:
if num > (1 + ratio) * last_num and count_nums != 0:
avgs.append(int(round(sum_nums/count_nums,0)))
sum_nums = num
count_nums = 1
else:
sum_nums = sum_nums + num
count_nums = count_nums + 1
last_num = num
avgs.append(int(round(sum_nums/count_nums,0)))
return avgs
# Split coordinates into two lists of x's and y's
xs, ys = map(list, zip(*xys))
# Find averages of each row and column within a specified error.
x_avgs = row_col_avgs(xs, 0.1)
y_avgs = row_col_avgs(ys, 0.1)
# Return Completed Averaged Grid
avg_grid = []
for y_avg in y_avgs:
avg_row = []
for x_avg in x_avgs:
avg_row.append([int(x_avg), int(y_avg)])
avg_grid.append(avg_row)
print(avg_grid)
Code Output
[[[102, 101], [201, 101], [301, 101]],
[[102, 204], [201, 204], [301, 204]],
[[102, 299], [201, 299], [301, 299]],
[[102, 400], [201, 400], [301, 400]]]
I am also looking for another solution using linear algebra. See my question here.

Resources