Drawbacks of K-Medoid (PAM) Algorithm - algorithm

I have researched that K-medoid Algorithm (PAM) is a parition-based clustering algorithm and a variant of K-means algorithm. It has solved the problems of K-means like producing empty clusters and the sensitivity to outliers/noise.
However, the time complexity of K-medoid is O(n^2), unlike K-means (Lloyd's Algorithm) which has a time complexity of O(n). I would like to ask if there are other drawbacks of K-medoid algorithm aside from its time complexity.

The main disadvantage of K-Medoid algorithms (either PAM, CLARA or CLARANS) is that they are not suitable for clustering non-spherical (arbitrary shaped) groups of objects.
This is because they rely on minimizing the distances between the non-medoid objects and the medoid (the cluster center) - briefly, they use compactness as clustering criteria instead of connectivity.
Another disadvantage of PAM is that it may obtain different results for different runs on the same dataset because the first k medoids are chosen randomly.
In addition to the aforementioned disadvantages, you must also specify the value for k (the number of clusters) in advance.

Related

Best time complexity for sequentially contructing a 3D Voronoi-Delaunay diagram

According to R. A. Dwyer, Algorithmica 2.1-4 (1987): 137-151 the Delaunay triangulation for a uniform distribution of N points in a unit square can be constructed in O(N lnlnN) time. I was wondering what's the currently fastest known sequential algorithm of constructing a Delaunay diagram for a uniform distribution in a cubic cell?
TL; DR: I'd expect quasi-O(n) behaviour for a set of "well-distributed" points in a cube.
Good incremental algorithms for the construction of Delaunay tessellations / Voronoi complexes in R^3 have worst-case run-times of O(n^2) (where n is the number of points). It's known that these worst cases rarely occur in practice though, and it's expected that most "real" cases exhibit quasi-O(n) scaling.
Documentation for the Triangulation_3 class available in CGAL includes a discussion of such behaviour, as well as links to several papers that provide bounds on asymptotic complexity for certain distributions of points.
In short, the running time of an incremental Delaunay algorithm is based on a few different factors: (a) the complexity of the kernel used to insert individual points (through the modification of local topology), (b) the complexity of the search algorithm used to locate the subset of the tessellation to be modified, and (c) the "structure" of the distribution of points to be added, and the order in which they're processed.
"Fast" algorithms for (a) and (b) (i.e. Bowyer-Watson, etc) are known, and (c) is amenable to "biased" quasi-random ordering strategies. The combination of these techniques typically leads to O(n) behaviour in most practical cases.
This only leaves a set of rather pathological cases for which lesser performance is observed. To me, these cases generally seem so specific that they'd essentially need to be constructed by hand.

Comparison of A* heuristics for solving an N-puzzle

I am trying to solve the N-puzzle using the A* algorithm with 3 different heuristic functions. I want to know how to compare each of the heuristics in terms of time complexity. The heuristics I am using are: manhattan distance , manhattan distance + linear conflict, N-max swap. And specifically for an 8-puzzle and an 15-puzzle.
The N-puzzle is, in general, NP hard to find the shortest solution, so no matter what heuristic you use it's unlikely you'll be able to find any difference in complexity between them, since you won't be prove the tightness of any bound.
If you restrict yourself to the 8-puzzle or 15-puzzle, an A* algorithm with any admissible heuristic will run in O(1) time since there are a finite (albeit large) number of board positions.
As #Harold said in his comment, the approach to compare time complexity of heuristic functions is tipically by experimental tests. In your case, generate a set of n random problems for the 8-puzzle and the 15-puzzle and solve them using the different heuristic functions. Things to be aware of are:
The comparison will always depend on several factors, like hardware expecs, programming language, your skills when implementing the algorithm, ...
Generally speaking, a more informed heuristic will always expand less nodes than a less informed one, and will probably be faster.
And finally, in order to compare the three heuristics for each problem set, I would suggest a graphic with average running times (repeat for example 5 times each problem) where:
The problems are in the x-axis sorted by difficulty.
The running times are in the y-axis for each heuristic function (perhaps in logarithmic scale if the difference between the alternatives cannot be easily seen).
and a similar graphic with the number of explored states.

Minimizing clustering classification error

Suppose we've some labeled data X with N data points. Using some clustering algorithm, say k-means, we partition X into k clusters C_1,...,C_k. Let S_1,...,S_k be the true partitioning sets and define the clustering classification error like this:
I want to then find the optimal "match" of the clusters to the true clusters by minimizing this error. So for k=3 the optimal permutation may be {(C_1 and S_2), (C_2 and S_3), (C_3 and S_1)}. The obvious way to find the optimal permutation would be to look at all k! permutations and the resulting error, and pick the one giving smallest error. This however requires k! time so my question is, would it be possible to design an algorithm to do this more efficiently?
There are good and well-tested algorithms for finding the best matching, such as the
Hungarian algorithm.
But usually, it is not a good idea to map clusters to classes.
A good clustering is one that tells you something new about your data. So it must be substantially different grom your known classes.

Should we used k-means++ instead of k-means?

The k-means++ algorithm helps in two following points of the original k-means algorithm:
The original k-means algorithm has the worst case running time of super-polynomial in input size, while k-means++ has claimed to be O(log k).
The approximation found can yield a not so satisfactory result with respect to objective function compared to the optimal clustering.
But are there any drawbacks of k-means++? Should we always used it instead of k-means from now on?
Nobody claims k-means++ runs in O(lg k) time; it's solution quality is O(lg k)-competitive with the optimal solution. Both k-means++ and the common method, called Lloyd's algorithm, are approximations to an NP-hard optimization problem.
I'm not sure what the worst case running time of k-means++ is; note that in Arthur & Vassilvitskii's original description, steps 2-4 of the algorithm refer to Lloyd's algorithm. They do claim that it works both better and faster in practice because it starts from a better position.
The drawbacks of k-means++ are thus:
It too can find a suboptimal solution (it's still an approximation).
It's not consistently faster than Lloyd's algorithm (see Arthur & Vassilvitskii's tables).
It's more complicated than Lloyd's algo.
It's relatively new, while Lloyd's has proven it's worth for over 50 years.
Better algorithms may exist for specific metric spaces.
That said, if your k-means library supports k-means++, then by all means try it out.
Not your question, but an easy speedup to any kmeans method for large N:
1) first do k-means on a random sample of say sqrt(N) of the points
2) then run full k-means from those centres.
I've found this 5-10 times faster than kmeans++ for N 10000, k 20, with similar results.
How well it works for you will depend on how well a sqrt(N) sample
approximates the whole, as well as on N, dim, k, ninit, delta ...
What are your N (number of data points), dim (number of features), and k ?
The huge range in users' N, dim, k, data noise, metrics ...
not to mention the lack of public benchmarks, make it tough to compare methods.
Added: Python code for kmeans() and kmeanssample() is
here on SO; comments are welcome.

How to test an algorithm for perfect optimization?

Is there any way to test an algorithm for perfect optimization?
There is no easy way to prove that any given algorithm is asymptotically optimal.
Proving optimality (if ever) sometimes follows years and/or decades after the algorithm has been written. A classic example is the Union-Find/disjoint-set data structure.
Disjoint-set forests are a data structure where each set is represented by a tree data structure, in which each node holds a reference to its parent node. They were first described by Bernard A. Galler and Michael J. Fischer in 1964, although their precise analysis took years.
[...] These two techniques complement each other; applied together, the amortized time per operation is only O(α(n)), where α(n) is the inverse of the function f(n) = A(n,n), and A is the extremely quickly-growing Ackermann function.
[...] In fact, this is asymptotically optimal: Fredman and Saks showed in 1989 that Ω(α(n)) words must be accessed by any disjoint-set data structure per operation on average.
For some algorithms optimality can be proven after very careful analysis, but generally speaking, there's no easy way to tell if an algorithm is optimal once it's written. In fact, it's not always easy to prove if the algorithm is even correct.
See also
Wikipedia/Matrix multiplication
The naive algorithm is O(N3), Strassen's is roughly O(N2.807), Coppersmith-Winograd is O(N2.376), and we still don't know what is optimal.
Wikipedia/Asymptotically optimal
it is an open problem whether many of the most well-known algorithms today are asymptotically optimal or not. For example, there is an O(nα(n)) algorithm for finding minimum spanning trees. Whether this algorithm is asymptotically optimal is unknown, and would be likely to be hailed as a significant result if it were resolved either way.
Practical considerations
Note that sometimes asymptotically "worse" algorithms are better in practice due to many factors (e.g. ease of implementation, actually better performance for the given input parameter range, etc).
A typical example is quicksort with a simple pivot selection that may exhibit quadratic worst-case performance, but is still favored in many scenarios over a more complicated variant and/or other asymptotically optimal sorting algorithms.
For those among us mortals that merely want to know if an algorithm:
reasonably works as expected;
is faster than others;
there is an easy step called 'benchmark'.
Pick up the best contenders in the area and compare them with your algorithm.
If your algorithm wins then it better matches your needs (the ones defined by
your benchmarks).

Resources