Clustering N points based on their distances

Clustering N points based on their distances - algorithm

I am given n points on a 2D grid. I am asked to cluster the points, where points with distance <=k for some constant k are grouped together. Note that all pairs of points within a cluster must adhere to the distance rule.
The way I thought about approaching this problem is, for each point, find its neighbors, where neighbors are defined by points that are within the distance k. We can assume each point and their initial neighbors form an initial cluster.
Then for each point, we compare its initial cluster with all of its neighbors clusters. If all of the neighbors clusters are the same as the current points cluster, then they're joined. If at least 1 of the neighbors clusters, is not the same, then all points involved and their neighbors must be disjoint.
e.g., say we have points
[0, -0.5]
[0, 0]
[0, 0.5]
[0, 1.5]
and k = 1
For point [0, -0.5] we have neighbors [0, 0] and [0, 0.5].
For point [0, 0] we have neighbors [0, 0.5] and [0, -0.5].
For point [0, 0.5] we have neighbors [0, 0], [0, -0.5], [0, 1.5].
For point [0, 1.5] we have neighbors [0, 0], [0, 1.5].
So in this case, we will have 4 clusters, defined by an individual point.
What is the most efficient way to implement this algorithm?

Related

Minimize the difference of the distance between points on the line

My problem is as follows:
Given n points on a line segment and a threshold k, pick the points on the line so that would minimize the average difference of the distance between each consecutive point and the threshold.
For example:
If we were given an array of points n = [0, 2, 5, 6, 8, 9], k = 3
Output: [0, 2, 6, 9]
Explanation: when we choose this path, the difference from the threshold in each interval is [1, 1, 0] which gets an average of .66 difference.
If I chose [0, 2, 5, 8, 9], the differences would be [1, 0, 0, 2], which averages to .75.
I understand enough dynamic programming to consider several solutions including memorization and depth-first search, but I was hoping someone could offer a specific algorithm with the best efficiency.

Biggest non-contiguous submatrix with all ones

I'm tackling the problem of finding a non-contiguous submatrix of a boolean matrix with maximum size such that all of its cells are ones.
As an example, consider the following matrix:
M = [[1, 0, 1, 1],
[0, 0, 1, 0],
[1, 1, 1, 1]]
A non-contiguous submatrix of M is specified as a set of rows R and a set of columns C. The submatrix is formed by all the cells that are in some row in R and in some column in C (the intersections of R and C). Note that a non-contiguous submatrix is a generalization of a submatrix, so any (contiguous) submatrix is also a non-contiguous submatrix.
There is one maximum non-contiguous submatrix of M that has a one in all of its cells. This submatrix is defined as R={1, 3, 4} and C={1, 3}, which yields:
M[1, 2, 4][1, 3] = [[1, 1, 1],
[1, 1, 1]]
I'm having difficulties finding existing literature about this problem. I'm looking for efficient algorithms that don't necessarily need to be optimal (so I can relax the problem to finding maximal size submatrices). Of course, this can be modeled with integer linear programming, but I want to consider other alternatives.
In particular, I want to know if this problem is already known and covered by the literature, and I want to know if my definition of non-contiguous matrix makes sense and whether already exists a different name for them.
Thanks!

Since per your response to Josef Wittmann's comment you want to find the Rectangle Covering Number, my suggestion would be to construct the Lovász–Saks graph and apply a graph coloring algorithm.
The Lovász–Saks graph has a vertex for each 1 entry in the matrix and an edge between each pair of vertices whose 2x2 matrix contains a zero. In your example,
[[1, 0, 1, 1],
[0, 0, 1, 0],
[1, 1, 1, 1]]
we can label the 1s with letters:
[[a, 0, b, c],
[0, 0, d, 0],
[e, f, g, h]]
and then get edges
a--d, a--f, b--f, c--d, c--f, d--e, d--f, d--h.
a b a 0 0 b b c 0 c 0 d 0 d d 0
0 d e f f g d 0 f h e f f g g h
I think an optimal coloring is
{a, b, c, e, g, h} -> 1
{d} -> 2
{f} -> 3.

Given facet information, how to efficiently check if a polyhedron composed of triangles is closed?

Let me clarify the definitions first. Consider a regular tetrahedron is composed of 4 vertices. Let's say the indices for these vertices are [0, 1, 2, 3]. Then, the definition of facet information is F = [[0, 1, 2], [0, 2, 3], [0, 3, 1], [1, 2, 3]]. A polyhedron composed of triangles is closed if any triangle facet is connected to 3 other triangles via edges. For example, a regular tetrahedron is closed.
Then, given facet information, how to efficiently check if a polyhedron composed of triangles is closed?
A naive solution to do this is as follows: making a graph that describes unordered connections between facets, then check that any node is connected to 3 other nodes. However, this naive method seems too slow for my application.
P.S. my implementation for comparing number of edges and vertices in python
def isClosed(F): # F is list of indices triplet
S = set()
for triplet in F:
for i, j in [[0, 1], [1, 2], [2, 0]]:
a, b = triplet[i], triplet[j]
key = (a, b) if a < b else (b, a)
S.add(key)
return len(F)*3 == len(S)*2

Find the end of a curved line in a binary image

I'm looking for an algorithm that will detect the end of a curved line. I'm going to convert a binary image into a point cloud as coordinates, and I need to find the end of the line so I can start another algorithm.
I was thinking of taking the average of vectors for the N nearest '1' pixels to each point, and saying that the pixel with the longest vector must be an endpoint, because if a point is in the middle of a line then the average of the vectors will cancel out. However, I figure this must be a problem that is well known in image processing so I thought I'd throw it up here to see if anybody knows a 'proper' algorithm.

If the line will only ever be one or perhaps two pixels thick, you can use the approach suggested by Malcolm McLean in a comment.
Otherwise, one way to do this is to compute, for each red pixel, the red pixel in the same component that is furthest away, as well as how far away that furthest pixel is. (In graph theory terms, the distance between these two pixels is the eccentricity of each pixel.) Pixels near the end of a long line will have the greatest eccentricities, because the shortest path between them and points at the other end of the line is long. (Notice that, whatever the maximum eccentricity turns out to be, there will be at least two pixels having it, since the distance from a to b is the same as the distance from b to a.)
If you have n red pixels, all eccentricities (and corresponding furthest pixels) can be computed in O(n^2) time: for each pixel in turn, start a BFS at that pixel, and take the deepest node you find as its furthest pixel (there may be several; any will do). Each BFS runs in O(n) time, because there are only a constant number of edges (4 or 8, depending on how you model pixel connectivity) incident on any pixel.
For robustness you might consider taking the top 10 or 50 (etc.) pixel pairs and checking that they form 2 well-separated, well-defined clusters. You could then take the average position within each cluster as your 2 endpoints.

If you apply thinning to the line, so your line is just one pixel thick, You can leverage morphologyEX and use MORPH_HITMISS in OpenCV. Essentially you create a template (kernel or filter) for every possible corner (there are 8 possible) and convolve by each one. The result of each convolution will be 1 in the place where the kernel matches and 0 otherwise. So you can do the same manually if you feell that you can do a better job in c.
here is an example. It takes as input_image any image of zeros and ones where the lines are one pixel thick.
import numpy as np
import cv2
import matplotlib.pylab as plt
def find_endoflines(input_image, show=0):
kernel_0 = np.array((
[-1, -1, -1],
[-1, 1, -1],
[-1, 1, -1]), dtype="int")
kernel_1 = np.array((
[-1, -1, -1],
[-1, 1, -1],
[1,-1, -1]), dtype="int")
kernel_2 = np.array((
[-1, -1, -1],
[1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_3 = np.array((
[1, -1, -1],
[-1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_4 = np.array((
[-1, 1, -1],
[-1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_5 = np.array((
[-1, -1, 1],
[-1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_6 = np.array((
[-1, -1, -1],
[-1, 1, 1],
[-1,-1, -1]), dtype="int")
kernel_7 = np.array((
[-1, -1, -1],
[-1, 1, -1],
[-1,-1, 1]), dtype="int")
kernel = np.array((kernel_0,kernel_1,kernel_2,kernel_3,kernel_4,kernel_5,kernel_6, kernel_7))
output_image = np.zeros(input_image.shape)
for i in np.arange(8):
out = cv2.morphologyEx(input_image, cv2.MORPH_HITMISS, kernel[i,:,:])
output_image = output_image + out
return output_image
if show == 1:
show_image = np.reshape(np.repeat(input_image, 3, axis=1),(input_image.shape[0],input_image.shape[1],3))*255
show_image[:,:,1] = show_image[:,:,1] - output_image *255
show_image[:,:,2] = show_image[:,:,2] - output_image *255
plt.imshow(show_image)

Parallel algorithm for set intersections

I have n-sets (distributed on n-ranks) of data which represents the nodes of a mesh and I wanted to know an efficient parallel algorithm to find the intersection of these sets, i.e., the common nodes. An intersection is defined as soon as any 2 sets share a node.
For example;
Input:
Rank 0: Set 1 - [0, 1, 2, 3, 4]
Rank 1: Set 2 - [2, 4, 5, 6]
Rank 2: Set 3 - [0, 5, 6, 7, 8]
Implement Parallel Algorithm --> Result: (after finding intersections)
Rank 0: [0, 2, 4]
Rank 1: [2, 4, 5, 6]
Rank 2: [0, 5, 6]
The algorithm needs to be done on n-ranks with 1 set on each rank.

You should be able to this fast O(N), in parallel, with hash tables.
For each set S_i, for each member m_x (all of which can be done in parallel), put the set member into a hash table associated with the set name, e.g., . Anytime you get a hit in the hash table on m_x from set S_j, you now have the corresponding set number S_i, and you know immediately that S_i intersects S_j. You can put m_x in the derived intersection sets.
You need a parallel-safe hash table. That's easy; lock the buckets during updates.
[Another answer suggested sorting the sets. With most sort algorithms, would be O(N ln N) time, not as fast].

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Clustering N points based on their distances - algorithm

Related

Minimize the difference of the distance between points on the line

Biggest non-contiguous submatrix with all ones

Given facet information, how to efficiently check if a polyhedron composed of triangles is closed?

Find the end of a curved line in a binary image

Parallel algorithm for set intersections

Categories

Resources