Ripleys K not plotting correctly? - sp

I am doing point pattern analysis using the package spatstat and ran Ripley's K (spatstat::Kest) on my points to see if there is any clustering. However, it appears that not all the lines that should appear in the graph (kFem) have plotted. For example, the red line (Ktrans) stops at around x=12 and the green line (Kbord) doesn't appear at all. I would appreciate any insights as to how to interpret this and if there might be a bug.
Here is my study window. It is an irregular shape because I am analyzing a point pattern along a transect line.
And here is a density plot of my point pattern:

It is unlikely (but not impossible) that there is a simple bug in Kest that causes this, since this particular function has been tested intensively by many users. More likely you have a observation window that is irregular and there is a mathematical reason why the various estimates cannot be calculated at all distances. Please add a plot/summary of your point pattern so we have knowledge of the observation window (or even better give access to the observation window).
Furthermore, to manually inspect the estimates of the K-function you can convert the function value (fv) object to a data.frame and print it:
dat <- as.data.frame(kFem)
head(dat, n = 10)
Update:
Your window is indeed very irregular and the explanation of why it is not producing some corrections at large distances. I guess your transect is only a few metres wide and you are considering distances up to 50m. The border correction can only be calculated for distances up to something like the half width of the transect.
Using Kest implies that you believe that your transect is a subset of a big homogeneous point process (of equal intensity everywhere and with same correlation structure throughout space). If that is true then Kest provides a sensible estimate of the unknown true homogeneous K-function. However, you provide a plot where you have divided the region into sections of high, medium and low intensity which doesn't agree with the assumption of homogeneity. The deviation from the theoretical Poisson line may just be due to inhomogeneous intensity and not actual correlation between points. You should probably only consider distances that are much smaller than 50 (you can set rmax when you call Kest).

Related

Finding optimal solution to multivariable function with non-negligible solution time?

So I have this issue where I have to find the best distribution that, when passed through a function, matches a known surface. I have written a script that creates the distribution given some parameters and spits out a metric that compares the given surface to the known, but this script takes a non-negligible time, so I can't just run through a very large set of parameters to find the optimal set of parameters. I looked into the simplex method, and it seems to be the right path, but its not quite what I need, because I dont exactly have a set of linear equations, and dont know the constraints for the parameters, but rather one method that gives a single output (an thats all). Can anyone point me in the right direction to how to solve this problem? Thanks!
To quickly go over my process / problem again, I have a set of parameters (at this point 2 but will be expanded to more later) that defines a distribution. This distribution is used to create a surface, which is compared to a known surface, and an error metric is produced. I want to find the optimal set of parameters, but cannot run through an arbitrarily large number of parameters due to the time constraint.
One situation consistent with what you have asked is a model in which you have a reasonably tractable probability distribution which generates an unknown value. This unknown value goes through a complex and not mathematically nice process and generates an observation. Your surface corresponds to the observed probability distribution on the observations. You would be happy finding the parameters that give a good least squares fit between the theoretical and real life surface distribution.
One approximation for the fitting process is that you compute a grid of values in the space output by the probability distribution. Each set of parameters gives you a probability for each point on this grid. The not nice process maps each grid point here to a nearest grid point in the space of the surface. The least squares fit is a quadratic in the probabilities calculated for the first grid, because the probabilities calculated for a grid point in the surface are the sums of the probabilities calculated for values in the first grid that map to something nearer to that point in the surface than any other point in the surface. This means that it has first (and even second) derivatives that you can calculate. If your probability distribution is nice enough you can use the chain rule to calculate derivatives for the least squares fit in the initial parameters. This means that you can use optimization methods to calculate the best fit parameters which require not just a means to calculate the function to be optimized but also its derivatives, and these are generally more efficient than optimization methods which require only function values, such as Nelder-Mead or Torczon Simplex. See e.g. http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/optim/package-summary.html.
Another possible approach is via something called the EM Algorithm. Here EM stands for Expectation-Maximization. It can be used for finding maximum likelihood fits in cases where the problem would be easy if you could see some hidden state that you cannot actually see. In this case the output produced by the initial distribution might be such a hidden state. One starting point is http://www-prima.imag.fr/jlc/Courses/2002/ENSI2.RNRF/EM-tutorial.pdf.

Plotting Issue - Comparing 3 matrices where one is a sparse matrix

I need to compare 3 216x216 matrix (data correlation matrix, events etc) . can someone suggest a way to plot these in matlab or someother plotting tools that can easily visualise and compare them ... does a 3d mesh plot be useful ? I thought mesh would be good .. but I need others opinion too.
Thanks in advance,
Sparse matrices
You can use the spy() method to visualize a "sparsity pattern", as Matlab calls it. It plots a dot (or any other marker) where the matrix element is non-zero.
spy() can also be used to visualize non-sparse matrices where a lot of entries are close to zero - just threshold the matrix first:
a=eye(50)+0.01*randn(50);
spy(a) % Not very useful
b=a; b(b<0.02)=0;
figure, spy(b) % Much more useful
More generally, you can apply upper and lower thresholds to visualize the location of matrix entries whose value is within a specific range.
Corellation
It may be useful to just display the matrix using imagesc(). This may give you an idea of the degree of corellation in your data - i.e. an uncorellated signal will have a corellation matrix with dominant diagonal elements, which will be clearly visible. I find Matlab's default color map distracting, so I usually do something like
colormap(gray);imagesc(a);
Miscellaneous
Of course, there's a whole host of non-visual comparisons you can make - various norm()'s, std(), spectral analysis using eig() for square matrices, or svd() more generally. You can compare eigenvalue magnitudes, or compare the eigenvectors. This may be very useful or complete garbage, depending on what your data is.
Thus, to conclude (for now), depending on what specifically your matrices contain, you may get more useful suggestions.

determining state space based on area

I have been tasked with figuring out a state space for a problem based on the area of a rectangle. It seems that I have made my state space far too large and need some feedback.
So far I have an area that has a value fo 600 for a y axis and 300 for an x axis. I determined the number of points to be
(600 x 300) ! or 180,000!
Therefore my robot would need to inspect this many potential spaces, before I apply an algorithm.
This number seems quite high and if that is the case it would make my problem unsolveable before I die especially if I implement the algorithm incorrectly. Any help would be greatly appreciated especially if my math is off in determining the number of points.
EDIT
I was under the impression to see how many pairs of points you would have to take the cartesian product of the total available points. Which in turn would be (600x300)! . If this is incorrect please let me know.
First of all, the number of "points" (as defined in mathematics - the only relevant definition) in a rectangle of any size (non-zero area) is infinity. Why? Because a point does not necessarily have to have integer coordinates - there can be a point at (0,0), (0,0.1), (0.001), (0,0.0001) and so on. I think what you mean by points in your question is that all points must have integer coordinates (i.e. lattice points), or alternately, "cells" in a rectangular grid (like cells on a chess board). Please let me know if I misunderstood your question.
There are 600 rows and 300 coloumns. This means that there are 600 * 300 = 180,000 different cells. It follows that there are nCr(180,000,2) = 16,199,910,000 unique pairs in the grid. I am assuming you consider the pair ((1,1),(2,2)) and ((2,2),(1,1)) equivalent. Otherwise, there are 180,000*180,000 = 32,400,000,000 pairs.

Averaging a set of points on a Google Map into a smaller set

I'm displaying a small Google map on a web page using the Google Maps Static API.
I have a set of 15 co-ordinates, which I'd like to represent as points on the map.
Due to the map being fairly small (184 x 90 pixels) and the upper limit of 2000 characters on a Google Maps URL, I can't represent every point on the map.
So instead I'd like to generate a small list of co-ordinates that represents an average of the big list.
So instead of having 15 sets, I'd end up with 5 sets, who's positions approximate the positions of the 15. Say there are 3 points that are in closer proximity to each-other than to any other point on the map, those points will be collapsed into 1 point.
So I guess I'm looking for an algorithm that can do this.
Not asking anyone to spell out every step, but perhaps point me in the direction of a mathematical principle or general-purpose function for this kind of thing?
I'm sure a similar function is used in, say, graphics software, when pixellating an image.
(If I solve this I'll be sure to post my results.)
I recommend K-means clustering when you need to cluster N objects into a known number K < N of clusters, which seems to be your case. Note that one cluster may end up with a single outlier point and another with say 5 points very close to each other: that's OK, it will look closer to your original set than if you forced exactly 3 points into every cluster!-)
If you are searching for such functions/classes, have a look at MarkerClusterer and MarkerManager utility classes. MarkerClusterer closely matches the described functionality, as seen in this demo.
In general I think the area you need to search around in is "Vector Quantization". I've got an old book title Vector Quantization and Signal Compression by Allen Gersho and Robert M. Gray which provides a bunch of examples.
From memory, the Lloyd Iteration was a good algorithm for this sort of thing. It can take the input set and reduce it to a fixed sized set of points. Basically, uniformly or randomly distribute your points around the space. Map each of your inputs to the nearest quantized point. Then compute the error (e.g. sum of distances or Root-Mean-Squared). Then, for each output point, set it to the center of the set that maps to it. This will move the point and possibly even change the set that maps to it. Perform this iteratively until no changes are detected from one iteration to the next.
Hope this helps.

What do you think of this interest point detection algorithm?

I've been trying to come up with an interest point detection algorithm and this is what I came up with:
You go through the X and the Y axises 3n pixels at a time creating 3n x 3n squares.
For the the n x n square in the middle of the 3n x 3n square (let's call it square Z), the R, G, and B values are averaged and rounded to preset values to limit the number of colors, and that is the color that square will be treated as.
The same is done for the 8 surrounding n x n squares.
After that, the color of square Z is compared to the surrounding squares, if it matches x out of the 8 surrounding squares where x <= 3 or x => 5 then that is an interest point (a corner is detected).
And so on till all the image is covered.
The bigger n is, the faster the image will be scanned and the the less accurate the detection is, and vice versa.
This, supposedly, detects "literal corners", that is corners you can actually SEE on the image.
What do you think of this algorithm? Is it efficient? Can it be used on a live video stream (say from the camera) on a hand-held device?
I'm sorry to say that I don't think this is likely to be very good. Your algorithm looks a bit like a simplistic version of Moravec's algorithm, which is itself one of the simplest corner detection algorithms. The hardcoded limits you test against effectively make your edge test a stepped function, unlike an approach such as summed square differences. This will almost certainly give you discontinuities in your detection function (corners that don't match when they should have), for some values.
You also have the same problem as Moravec, namely that if the edge lies at an angle to the direction of neighbours being considered, then it won't be detected.
Developing algorithms is fun, and if this isn't a business-critical project, then by all means, carry on tinkering and experimenting (and don't be put off by my comments!). But the fact is, for almost any practical problem, a better algorithm for the task you want to solve almost certainly already exists. The real challenge is identifying how you can best model your problem in such a way that you can solve it using an existing, well-understood approach, designed by experts.
In particular, robust identification and analysis of edge-cases and worst-case runtimes is a tricky business; unless you are a professional algorist, you are likely to find the going difficult. But I certainly encourage you to discover this for yourself by trying. nlucaroni mentions some excellent questions to use as starting points for your analysis.
Why not try it and see if it works the way you expect? It sounds like it should. How does the performance compare with other methods? What is the complexity of the algorithm? Is it efficient compared to others? Where can it be improved? What kind of false-positives and false negatives are expected? Are they within reason based on the data I plan to use this on? What threshold should be used to compare surrounding squares? ....
this is stuff you should be doing, not us.
I would suggest you look at the SIFT algorithm. Its the defacto standard for points of interest in an image. Unfortunately, its also patented, because its so good.
If you are interested in a real time version of SIFT you can get it to run on a GPU, but its highly experimental at this point. Note if you are developing a commercial application you'd have to first purchase a license for using SIFT or get approval from David Lowe.

Resources