Calculating multilevel observed mean nearest neighbor distance per group per year in r - nearest-neighbor

I have a multi-year dataset of bird nests from five regions. Some nests have the same locations each year (repeated, identical coordinates for each year), some do not. And some region starts with only one nest, so those years and region should be omitted.
Now, I would like to compare the annual change of observed mean nearest neighbor distance per region. So, the output should be one value per region per year. First, it seemed to be an easy task, but now I am stuck. Here is the script, when I got the closest, but it returns the same values for each region and each year. Could someone advise me, please?
library(sf)
library(sp)
library(tidyverse)
library(spatstat)
# Read in data
data <- st_read("PEFA_EVI_LU2.shp")
data <- st_as_sf(df0, coords=c("Lon", "Lat"), crs=4326)
coords <- st_coordinates(data_sp)
data <- as.data.frame(coords)
names(data) <- c("Lon", "Lat")
data$Cluster_ID <- data_sf$Cluster_ID
data$Year <- data_sf$Year
# Define function to compute nearest neighbor distance
nnd_per_cluster_year <- function(df) {
# Create spatial point pattern object for the current year and cluster
pp <- ppp(df$Lon, df$Lat, window = owin(c(min(df$Lon), max(df$Lon)), c(min(df$Lat), max(df$Lat))))
# Compute nearest neighbor distance
nnd <- nndist(pp)
# Return observed mean nearest neighbor distance
return(mean(nnd))
}
# Compute observed mean nearest neighbor distance per cluster per year
obs_nnd <- data %>%
group_by(Year, Cluster_ID) %>%
summarize(mean_nnd = nnd_per_cluster_year(.)) %>%
ungroup()
obs_nnd **#it returns the same value for each region and year **
write.csv(obs_nnd, "obs_nnd.csv")

Related

Best algorithm to find a region of the same values

I would like to return a list of the pixels that belongs to the same region, after clicking on one of them. The input would be the chosen pixel (seed) and the output would be a list of all pixels that have the same value and belongs to the same region (are not separatet by any pixel of different value).
My idea was to create an auxiliary list of seeds and check the neighbours of each of them. If the value of the neighbour is the same as of the seed, it is appended to the region list. My python implementation is below:
def region_growing(x, y):
value = image[x,y]
region = [(x,y),]
seeds = [(x,y),]
while seeds:
seed = seeds.pop()
x = seed[0]
y = seed[1]
for i in range(x-1, x+2):
for j in range(y-1, y+2):
if image[i,j] == value:
point = (i,j,z)
if point not in region:
seeds.append(point)
region.append(point)
return region
It works, but is very slow for bigger regions. What algorithm would you suggest?
The problem is the instruction if point not in region whose execution time will increase with the size of the region. The complexity is thus quadratic.
Another problem is that you visit the same pixels multiple times at the boundary of the region since you only keep track of pixels in the region.
You can avoid this by using a dictionary of visited pixels with the point as key.
def region_growing(x, y):
value = image[x,y]
region = [(x,y),]
seeds = [(x,y),]
visited = {(x,y):true}
while seeds:
seed = seeds.pop()
x = seed[0]
y = seed[1]
for i in range(x-1, x+2):
for j in range(y-1, y+2):
point = (i,j)
if point in visited:
continue
visited[point] = true
if image[i,j] == value:
region.append(point)
seeds.append(point)
return region
Another method is to use a matrix of booleans instead of the dictionary. This is faster but requires more memory space.
I can suggest you to use any region-fill/paint algorithm and patch it not to paint but to track pixels of the same region. The Smith's algorithm is known to be fast and efficient, see Tint Fill Algorithm.
Note that it is inefficient to store all pixels, but as the algorithm suggest horizontal segments are sufficient (thus only two pixels par segment).

Initial centroids in k-means

So I found a description online that says:
Start with the center of all points. Choose successively the point that is the furthest away from all centers as a center for the next cluster.
So from this I take that:
center = avg of all points
centroid1 = point furthest away from center
centroid2 = point furthest away from center AND cencroid1
centroid3 = point furthest away from center AND cencroid1 AND centroid2.
My problem is, how am I supposed to calculate for example the furthest point from center and centroid1? Do I average them and then choose the furthest point from the middle? Do I calculate the max distance point from both center and centroid1 and choose the further one? If so, wouldn't centroid3 become equal to centroid1 or 2?
In this document Centroids Initialization for K-Means Clustering using Improved Pillar Algorithm furthest means sum. So, on the second step you need to sum distance from the first centroid and distance form the average of all points for every point and then choose the biggest one.
Relevant lines in provided pseudo-code are
2. Calculate D <- dis(X, m)
...
6. Set i = 1 as counter to determine the i-th initial centroid
7. DM = DM + D
8. Select x <- xargmax(DM) as the candidate for i-th initial centroids
To select a next x for the candidate of the rest initial centroids, Di (where i is the current iteration step) is recalculated between each data points and ci-1 . The Di is then added to the accumulated distance metric DM (DM <- DM + Di).

Multiliteration implementation with inaccurate distance data

I am trying to create an android smartphone application which uses Apples iBeacon technology to determine the current indoor location of itself. I already managed to get all available beacons and calculate the distance to them via the rssi signal.
Currently I face the problem, that I am not able to find any library or implementation of an algorithm, which calculates the estimated location in 2D by using 3 (or more) distances of fixed points with the condition, that these distances are not accurate (which means, that the three "trilateration-circles" do not intersect in one point).
I would be deeply grateful if anybody can post me a link or an implementation of that in any common programming language (Java, C++, Python, PHP, Javascript or whatever). I already read a lot on stackoverflow about that topic, but could not find any answer I were able to convert in code (only some mathematical approaches with matrices and inverting them, calculating with vectors or stuff like that).
EDIT
I thought about an own approach, which works quite well for me, but is not that efficient and scientific. I iterate over every meter (or like in my example 0.1 meter) of the location grid and calculate the possibility of that location to be the actual position of the handset by comparing the distance of that location to all beacons and the distance I calculate with the received rssi signal.
Code example:
public Location trilaterate(ArrayList<Beacon> beacons, double maxX, double maxY)
{
for (double x = 0; x <= maxX; x += .1)
{
for (double y = 0; y <= maxY; y += .1)
{
double currentLocationProbability = 0;
for (Beacon beacon : beacons)
{
// distance difference between calculated distance to beacon transmitter
// (rssi-calculated distance) and current location:
// |sqrt(dX^2 + dY^2) - distanceToTransmitter|
double distanceDifference = Math
.abs(Math.sqrt(Math.pow(beacon.getLocation().x - x, 2)
+ Math.pow(beacon.getLocation().y - y, 2))
- beacon.getCurrentDistanceToTransmitter());
// weight the distance difference with the beacon calculated rssi-distance. The
// smaller the calculated rssi-distance is, the more the distance difference
// will be weighted (it is assumed, that nearer beacons measure the distance
// more accurate)
distanceDifference /= Math.pow(beacon.getCurrentDistanceToTransmitter(), 0.9);
// sum up all weighted distance differences for every beacon in
// "currentLocationProbability"
currentLocationProbability += distanceDifference;
}
addToLocationMap(currentLocationProbability, x, y);
// the previous line is my approach, I create a Set of Locations with the 5 most probable locations in it to estimate the accuracy of the measurement afterwards. If that is not necessary, a simple variable assignment for the most probable location would do the job also
}
}
Location bestLocation = getLocationSet().first().location;
bestLocation.accuracy = calculateLocationAccuracy();
Log.w("TRILATERATION", "Location " + bestLocation + " best with accuracy "
+ bestLocation.accuracy);
return bestLocation;
}
Of course, the downside of that is, that I have on a 300m² floor 30.000 locations I had to iterate over and measure the distance to every single beacon I got a signal from (if that would be 5, I do 150.000 calculations only for determine a single location). That's a lot - so I will let the question open and hope for some further solutions or a good improvement of this existing solution in order to make it more efficient.
Of course it has not to be a Trilateration approach, like the original title of this question was, it is also good to have an algorithm which includes more than three beacons for the location determination (Multilateration).
If the current approach is fine except for being too slow, then you could speed it up by recursively subdividing the plane. This works sort of like finding nearest neighbors in a kd-tree. Suppose that we are given an axis-aligned box and wish to find the approximate best solution in the box. If the box is small enough, then return the center.
Otherwise, divide the box in half, either by x or by y depending on which side is longer. For both halves, compute a bound on the solution quality as follows. Since the objective function is additive, sum lower bounds for each beacon. The lower bound for a beacon is the distance of the circle to the box, times the scaling factor. Recursively find the best solution in the child with the lower lower bound. Examine the other child only if the best solution in the first child is worse than the other child's lower bound.
Most of the implementation work here is the box-to-circle distance computation. Since the box is axis-aligned, we can use interval arithmetic to determine the precise range of distances from box points to the circle center.
P.S.: Math.hypot is a nice function for computing 2D Euclidean distances.
Instead of taking confidence levels of individual beacons into account, I would instead try to assign an overall confidence level for your result after you make the best guess you can with the available data. I don't think the only available metric (perceived power) is a good indication of accuracy. With poor geometry or a misbehaving beacon, you could be trusting poor data highly. It might make better sense to come up with an overall confidence level based on how well the perceived distance to the beacons line up with the calculated point assuming you trust all beacons equally.
I wrote some Python below that comes up with a best guess based on the provided data in the 3-beacon case by calculating the two points of intersection of circles for the first two beacons and then choosing the point that best matches the third. It's meant to get started on the problem and is not a final solution. If beacons don't intersect, it slightly increases the radius of each up until they do meet or a threshold is met. Likewise, it makes sure the third beacon agrees within a settable threshold. For n-beacons, I would pick 3 or 4 of the strongest signals and use those. There are tons of optimizations that could be done and I think this is a trial-by-fire problem due to the unwieldy nature of beaconing.
import math
beacons = [[0.0,0.0,7.0],[0.0,10.0,7.0],[10.0,5.0,16.0]] # x, y, radius
def point_dist(x1,y1,x2,y2):
x = x2-x1
y = y2-y1
return math.sqrt((x*x)+(y*y))
# determines two points of intersection for two circles [x,y,radius]
# returns None if the circles do not intersect
def circle_intersection(beacon1,beacon2):
r1 = beacon1[2]
r2 = beacon2[2]
dist = point_dist(beacon1[0],beacon1[1],beacon2[0],beacon2[1])
heron_root = (dist+r1+r2)*(-dist+r1+r2)*(dist-r1+r2)*(dist+r1-r2)
if ( heron_root > 0 ):
heron = 0.25*math.sqrt(heron_root)
xbase = (0.5)*(beacon1[0]+beacon2[0]) + (0.5)*(beacon2[0]-beacon1[0])*(r1*r1-r2*r2)/(dist*dist)
xdiff = 2*(beacon2[1]-beacon1[1])*heron/(dist*dist)
ybase = (0.5)*(beacon1[1]+beacon2[1]) + (0.5)*(beacon2[1]-beacon1[1])*(r1*r1-r2*r2)/(dist*dist)
ydiff = 2*(beacon2[0]-beacon1[0])*heron/(dist*dist)
return (xbase+xdiff,ybase-ydiff),(xbase-xdiff,ybase+ydiff)
else:
# no intersection, need to pseudo-increase beacon power and try again
return None
# find the two points of intersection between beacon0 and beacon1
# will use beacon2 to determine the better of the two points
failing = True
power_increases = 0
while failing and power_increases < 10:
res = circle_intersection(beacons[0],beacons[1])
if ( res ):
intersection = res
else:
beacons[0][2] *= 1.001
beacons[1][2] *= 1.001
power_increases += 1
continue
failing = False
# make sure the best fit is within x% (10% of the total distance from the 3rd beacon in this case)
# otherwise the results are too far off
THRESHOLD = 0.1
if failing:
print 'Bad Beacon Data (Beacon0 & Beacon1 don\'t intersection after many "power increases")'
else:
# finding best point between beacon1 and beacon2
dist1 = point_dist(beacons[2][0],beacons[2][1],intersection[0][0],intersection[0][1])
dist2 = point_dist(beacons[2][0],beacons[2][1],intersection[1][0],intersection[1][1])
if ( math.fabs(dist1-beacons[2][2]) < math.fabs(dist2-beacons[2][2]) ):
best_point = intersection[0]
best_dist = dist1
else:
best_point = intersection[1]
best_dist = dist2
best_dist_diff = math.fabs(best_dist-beacons[2][2])
if best_dist_diff < THRESHOLD*best_dist:
print best_point
else:
print 'Bad Beacon Data (Beacon2 distance to best point not within threshold)'
If you want to trust closer beacons more, you may want to calculate the intersection points between the two closest beacons and then use the farther beacon to tie-break. Keep in mind that almost anything you do with "confidence levels" for the individual measurements will be a hack at best. Since you will always be working with very bad data, you will defintiely need to loosen up the power_increases limit and threshold percentage.
You have 3 points : A(xA,yA,zA), B(xB,yB,zB) and C(xC,yC,zC), which respectively are approximately at dA, dB and dC from you goal point G(xG,yG,zG).
Let's say cA, cB and cC are the confidence rate ( 0 < cX <= 1 ) of each point.
Basically, you might take something really close to 1, like {0.95,0.97,0.99}.
If you don't know, try different coefficient depending of distance avg. If distance is really big, you're likely to be not very confident about it.
Here is the way i'll do it :
var sum = (cA*dA) + (cB*dB) + (cC*dC);
dA = cA*dA/sum;
dB = cB*dB/sum;
dC = cC*dC/sum;
xG = (xA*dA) + (xB*dB) + (xC*dC);
yG = (yA*dA) + (yB*dB) + (yC*dC);
xG = (zA*dA) + (zB*dB) + (zC*dC);
Basic, and not really smart but will do the job for some simple tasks.
EDIT
You can take any confidence coef you want in [0,inf[, but IMHO, restraining at [0,1] is a good idea to keep a realistic result.

What does it mean to get the (MSE) mean error squared for 2 images?

The MSE is the average of the channel error squared.
What does that mean in comparing two same size images?
For two pictures A, B you take the square of the difference between every pixel in A and the corresponding pixel in B, sum that up and divide it by the number of pixels.
Pseudo code:
sum = 0.0
for(x = 0; x < width;++x){
for(y = 0; y < height; ++y){
difference = (A[x,y] - B[x,y])
sum = sum + difference*difference
}
}
mse = sum /(width*height)
printf("The mean square error is %f\n",mse)
Conceptually, it would be:
1) Start with red channel
2) Compute the difference between each pixel's gray level value in the two image's red channels pixel-by-pixel (redA(0,0)-redB(0,0) etc for all pixel locations.
3) Square the differences of every one of those pixels (redA(0,0)-redB(0,0)^2
4) Compute the sum of the squared difference for all pixels in the red channel
5) Repeat above for the green and blue channels
6) Add the 3 sums together and divide by 3, i.e, (redsum+greensum+bluesum)/3
7) Divide by the area of the image (Width*Height) to form the mean or average, i.e., (redsum+greensum+bluesum)/(3*Width*Height) = MSE
Note that the E in error is synonymous with difference. So it could be called the Mean Squared Difference. Also mean is the same as average. So it could also be called the Average Squared Difference.
You can have a look at following article: http://en.wikipedia.org/wiki/Mean_squared_error#Definition_and_basic_properties. There "Yi" represents the true values and "hat_Yi" represents the values with which we want to compare the true values.
So, in your case you can consider one image as the reference image and the second image as the image whose pixel values you would like to compare with the first one....and you do so by calculating the MSE which tells you "how different/similar is the second image to the first one"
Check out wikipedia for MSE, it's a measure of the difference between each pixel value. Here's a sample implementation
def MSE(img1, img2):
squared_diff = (img1 -img2) ** 2
summed = np.sum(squared_diff)
num_pix = img1.shape[0] * img1.shape[1] #img1 and 2 should have same shape
err = summed / num_pix
return err
Let's us assume you have two points in a 2-dimensional space A(x1,y1) and B(x2,y2), the distance between the two points is calculated as sqrt((x1-x2)^2+(y1-y2)^2). If the the two points are in 3-dimensional space, it can be calculated as sqrt((x1-x2)^2+(y1-y2)^2+(z1-z2)^2). For two points in n-dimensional space, the distance formulae can be extended as sqrt(sumacrossdimensions(valueofAindim-valueofBindim)^2) (since latex is not allowed).
Now, the image with n pixels can be viewed as a point in n-dimensional space. The distance between two images with n pixels can be thoughts as the distance between 2 points in n-dimensional space. This distance is called MSE.

How to filter a set of 2D points moving in a certain way

I have a list of points moving in two dimensions (x- and y-axis) represented as rows in an array. I might have N points - i.e., N rows:
1 t1 x1 y1
2 t2 x2 y2
.
.
.
N tN xN yN
where ti, xi, and yi, is the time-index, x-coordinate, and the y-coordinate for point i. The time index-index ti is an integer from 1 to T. The number of points at each such possible time index can vary from 0 to N (still with only N points in total).
My goal is the filter out all the points that do not move in a certain way; or to keep only those that do. A point must move in a parabolic trajectory - with decreasing x- and y-coordinate (i.e., moving to the left and downwards only). Points with other dynamic behaviour must be removed.
Can I use a simple sorting mechanism on this array - and then analyse the order of the time-index? I have also considered the fact each point having the same time-index ti are physically distinct points, and so should be paired up with other points. The complexity of the problem grew - and now I turn to you.
NOTE: You can assume that the points are confined to a sub-region of the (x,y)-plane between two parabolic curves. These curves intersect only at only at one point: A point close to the origin of motion for any point.
More Information:
I have made some datafiles available:
MATLAB datafile (1.17 kB)
same data as CSV with semicolon as column separator (2.77 kB)
Necessary context:
The datafile hold one uint32 array with 176 rows and 5 columns. The columns are:
pixel x-coordinate in 175-by-175 lattice
pixel y-coordinate in 175-by-175 lattice
discrete theta angle-index
time index (from 1 to T = 10)
row index for this original sorting
The points "live" in a 175-by-175 pixel-lattice - and again inside the upper quadrant of a circle with radius 175. The points travel on the circle circumference in a counterclockwise rotation to a certain angle theta with horizontal, where they are thrown off into something close to a parabolic orbit. Column 3 holds a discrete index into a list with indices 1 to 45 from 0 to 90 degress (one index thus spans 2 degrees). The theta-angle was originally deduces solely from the points by setting up the trivial equations of motions and solving for the angle. This gives rise to a quasi-symmetric quartic which can be solved in close-form. The actual metric radius of the circle is 0.2 m and the pixel coordinate were converted from pixel-coordinate to metric using simple linear interpolation (but what we see here are the points in original pixel-space).
My problem is that some points are not behaving properly and since I need to statistics on the theta angle, I need to remove the points that certainly do NOT move in a parabolic trajoctory. These error are expected and fully natural, but still need to be filtered out.
MATLAB plot code:
% load data and setup variables:
load mat_points.mat;
num_r = 175;
num_T = 10;
num_gridN = 20;
% begin plotting:
figure(1000);
clf;
plot( ...
num_r * cos(0:0.1:pi/2), ...
num_r * sin(0:0.1:pi/2), ...
'Color', 'k', ...
'LineWidth', 2 ...
);
axis equal;
xlim([0 num_r]);
ylim([0 num_r]);
hold all;
% setup grid (yea... went crazy with one):
vec_tickValues = linspace(0, num_r, num_gridN);
cell_tickLabels = repmat({''}, size(vec_tickValues));
cell_tickLabels{1} = sprintf('%u', vec_tickValues(1));
cell_tickLabels{end} = sprintf('%u', vec_tickValues(end));
set(gca, 'XTick', vec_tickValues);
set(gca, 'XTickLabel', cell_tickLabels);
set(gca, 'YTick', vec_tickValues);
set(gca, 'YTickLabel', cell_tickLabels);
set(gca, 'GridLineStyle', '-');
grid on;
% plot points per timeindex (with increasing brightness):
vec_grayIndex = linspace(0,0.9,num_T);
for num_kt = 1:num_T
vec_xCoords = mat_points((mat_points(:,4) == num_kt), 1);
vec_yCoords = mat_points((mat_points(:,4) == num_kt), 2);
plot(vec_xCoords, vec_yCoords, 'o', ...
'MarkerEdgeColor', 'k', ...
'MarkerFaceColor', vec_grayIndex(num_kt) * ones(1,3) ...
);
end
Thanks :)
Why, it looks almost as if you're simulating a radar tracking debris from the collision of two missiles...
Anyway, let's coin a new term: object. Objects are moving along parabolae and at certain times they may emit flashes that appear as points. There are also other points which we are trying to filter out.
We will need some more information:
Can we assume that the objects obey the physics of things falling under gravity?
Must every object emit a point at every timestep during its lifetime?
Speaking of lifetime, do all objects begin at the same time? Can some expire before others?
How precise is the data? Is it exact? Is there a measure of error? To put it another way, do we understand how poorly the points from an object might fit a perfect parabola?
Sort the data with (index,time) as keys and for all locations of a point i see if they follow parabolic trajectory?
Which part are you facing problem? Sorting should be very easy. IMHO, it is the second part (testing if a set of points follow parabolic trajectory) that is difficult.

Resources