Can One-vs-all k-NN classifier, have different values of k for each class? - knn

I have three classes and I am using k-NN to classify them. I am following one -vs-all strategy. My question is, can the three binary classifiers have different value of k?

Related

Is it possible to perform clustering analysis in an object features matrix?

I'm currently trying to study different clustering analysis methods however in the examples I'm looking at I can only find clustering analysis on matrix with numerical variables. I was wondering if I could apply some of the most known clustering methods such as K-clustering or Hierarchical clustering on a matrix containing non numerical values. For example:
How would one person perform clustering analysis on this kind of matrix?
Thank you
Of course.
Just use an appropriate distance measure for your data and you can use HAC.
Or choose another algorithm. There are dozens of clustering algorithms, not just two.
The main problem with such data is that the results are mostly meaningless. By what notion of significance could a result be better than random on this data?

How can I do multi class classification using naive bayes classifier?

How can I do multi class classification using naive bayes classifier?
I am developing diseases classification system based on symptoms. I know training data is needed. But I don't have. I have only probabilities of symptoms for each disease. Is it possible to develop?
There are two ways of extending simple classifiers to do multi class classification:
Source Wikipedia
The first one is called One-vs.-rest strategy. It involves training a single classifier per class, with the samples of that class as positive samples and all other samples as negatives. This strategy requires the base classifiers to produce a real-valued confidence score for its decision. During inference, you give a sample to each model, retrieve the probabilities of belonging to the positive class and chose the class where the classifier is most confident.
The second way is called one-vs.-one (OvO) reduction, one trains K (K − 1) / 2 binary classifiers for a K-way multiclass problem; each receives the samples of a pair of classes from the original training set, and must learn to distinguish these two classes. At prediction time, a voting scheme is applied: all K (K − 1) / 2 classifiers are applied to an unseen sample and the class that got the highest number of "+1" predictions gets predicted by the combined classifier. This approach can lead to ambiguity in some cases.
I would recommend using One vs Rest. It is already implemented in some packages such as Sklearn
http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html

Algorithm for Hasse diagram showing subset inclusion

I am looking for an algorithm to construct the Hasse diagram representing a collection of integer sets, ordered by subset inclusion. I have found many discussions and publications on Frequent Itemset analysis (Norris, Ganter, Bordat, ..) but they doesn't seem to give pointers to work on the more fundamental task of just building a lattice representing some given itemsets.
I can think of a naive method where one first compares the sets all against all, then uses one of the known algorithms for transitive reduction.
Can one improve on this and avoid explicit comparison between all sets?

Data mining algorithm selection for 3 classes with negative and positive values

I am trying to handle a data set on matlab with 3 classes and negative and positive values on attributes. I tried naive bayes classifier but matlab says tha naive bayes can't handle negative values. Svm algorithm also can't handle this problem because there are 3 classes. So, i am asking you which algorithm to chose?
Thank you in advance!!
The simples solution that comes to mind is a k-NN classifier using majority voting. Say you want to classify a point and you use 10 nearest neighbours. Let's say that six out of 10 are class 1, two neighbours are class 2 and the two remaining neighbours are class 3, so in this case you would classify your point as class 1.
If you want to include nonlinearity (as in the case of SVM) you can use nonlinear kernels in k-NN too which basically means modifying the distance calculation.
citing wikipedia:
Multiclass SVM[edit]Multiclass SVM aims to assign labels to instances by using support vector machines, where the labels are drawn from a finite set of several elements.
The dominant approach for doing so is to reduce the single multiclass problem into multiple binary classification problems.[8] Common methods for such reduction include:[8] [9]
Building binary classifiers which distinguish between (i) one of the labels and the rest (one-versus-all) or (ii) between every pair of classes (one-versus-one). Classification of new instances for the one-versus-all case is done by a winner-takes-all strategy, in which the classifier with the highest output function assigns the class (it is important that the output functions be calibrated to produce comparable scores). For the one-versus-one approach, classification is done by a max-wins voting strategy, in which every classifier assigns the instance to one of the two classes, then the vote for the assigned class is increased by one vote, and finally the class with the most votes determines the instance classification.
Directed Acyclic Graph SVM (DAGSVM)[10]
error-correcting output codes[11]
Crammer and Singer proposed a multiclass SVM method which casts the multiclass classification problem into a single optimization problem, rather than decomposing it into multiple binary classification problems.[12] See also Lee, Lin and Wahba.[13][14]

Group detection in data sets

Assume a group of data points, such as one plotted here (this graph isn't specific to my problem, but just used as a suitable example):
Inspecting the scatter graph visually, it's fairly obvious the data points form two 'groups', with some random points that do not obviously belong to either.
I'm looking for an algorithm, that would allow me to:
start with a data set of two or more dimensions.
detect such groups from the dataset without prior knowledge on how many (or if any) might be there
once the groups have been detected, 'ask' the model of groups, if a new sample point seems to fit to any of the groups
There are many choices, but if you are interested in the probability that a new data point belongs to a particular mixture, I would use a probabilistic approach such as Gaussian mixture modeling either estimated by maximum likelihood or Bayes.
Maximum likelihood estimation of mixtures models is implemented in Matlab.
Your requirement that the number of components is unknown makes your model more complex. The dominant probabilistic approach is to place a Dirichlet Process prior on the mixture distribution and estimate by some Bayesian method. For instance, see this paper on infinite Gaussian mixture models. The DP mixture model will give you inference over the number of components and the components each elements belong to, which is exactly what you want. Alternatively you could perform model selection on the number of components, but this is generally less elegant.
There are many implementation of DP mixture models models, but they may not be as convenient. For instance, here's a Matlab implementation.
Your graph suggests you are an R user. In that case, if you are looking for prepacked solutions, the answer to your question lies on this Task View for cluster analysis.
I think you are looking for something along the lines of a k-means clustering algorithm.
You should be able to find adequate implementations in most general purpose languages.
You need one of clustering algorithms. All of them can be devided in 2 groups:
you specify number of groups (clusters) - 2 clusters in your example
algorithm try to guess correct number of clusters by itself
If you want algorithm of 1st type then K-Means is what you really need.
If you want algorithm of 2nd type then you probably need one of hierarchical clustering algorithms. I haven't ever implement any of them. But I see an easy way to improve K-means in such way thay it will be unnecessary to specify number of clusters.

Resources