Is kNN a statistical classifier? - algorithm

I'm currently working on a Machine Learning project for my Artificial Intelligence exam. The goal is to correctly choose two classification algorithms to compare using WEKA, bearing in mind that these two algorithms must be different enough to give the comparison a reason to be made. Besides, the algorithms must handle both nominal and numeric data (I suppose this is mandatory to let the comparison be made).
My professor suggested to choose a statistical classifier and a decision tree classifier, for example, or to delve into a comparison between a bottom-up classifier and a top-down one.
Since I have very little experience in the Machine Learning field, I am doing some research on the various algorithms WEKA offers, and I stepped on kNN, that is, k-nearest neighbors algorithm.
Is it statistical? And could it be compared with a Decision Stump algorithm, for example?
Or else, can you suggest a couple of algorithms that match these requirements I have pointed out above?
P. S.: Handled data must be both numerical and nominal. On WEKA there are numerical/nominal features and numerical/nominal classes. Do I have to choose algorithms with both numerical/nominal features AND classes or just one of them?
I would really appreciate any help guys, thanks for your patience!

Based on your professor's description, I would not consider k-Nearest Neighbors (kNN) a statistical classifier. In most contexts, a statistical classifier is one that generalizes via statistics of the training data (either by using statistics directly or by transforming them). An example of this is the Naïve Bayes Classifier.
By contrast, kNN is an example of Instance-Based Learning. It doesn't use statistics of the training data; rather, it compares new observations directly to the training instances to perform classification.
With regard to comparison, yes you can compare performance of kNN with a Decision Stump (or any other classifier). Since any two supervised classifiers will yield a classification accuracies with respect to your training/testing data, you can compare their performance.

Related

Does all evolutionary algorithm encode the population in binary terms

I am new to heuristic methods of optimization and learning about different optimization algorithms available in this space like Gentic Algorithm, PSO, DE, CMA ES etc.. The general flow of any of these algorithms seem to be initialise a population, select, crossover and mutation for update , evaluate and the cycle continues. The initial step of population creation in genetic algorithm seems to be that each member of the population is encoded by a chromosome, which is a bitstring of 0s and 1s and then all the other operations are performed. GE has simple update methods to popualation like mutation and crossover, but update methods are different in other algorithms.
My query here is do all the other heuristic algorithms also initialize the population as bitstrings of 0 and 1s or do they use the general natural numbers?
The representation of individuals in evolutionary algorithms (EA) depends on the representation of a candidate solution. If you are solving a combinatorial problem i.e. knapsack problem, the final solution is comprised of (0,1) string, so it makes sense to have a binary representation for the EA. However, if you are solving a continuous black-box optimisation problem, then it makes sense to have a representation with continuous decision variables.
In the old days, GA and other algorithms only used binary representation even for solving continuous problems. But nowadays, all the algorithms you mentioned have their own binary and continuous (and etc.) variants. For example, PSO is known as a continuous problem solver, but to update the individuals (particles), there are mapping strategies such as s-shape transform or v-shape transform to update the binary individuals for the next iteration.
My two cents: the choice of the algorithm relies on the type of the problem, and I personally won't recommend using a binary PSO at first try to solve a problem. Maybe there are benefits hidden there but need investigation.
Please feel free to extend your question.

which clustering algorithm is best for clustering one-dimensional features?

Which clusetring machine learning algorithm is best to be used for clustering one-dimensional numerical features (scalar values)?
Is it Birch, Spectral clustering, k-means, DBSCAN...or something else?
All of these methods are better for multivariate data. Except for k-means which historically was used on oneudimensional data, they were all designed with the multivariate problem in mind, and none of them is well optimized for the particular case of 1-dimensional data.
For one-dimensional data, use kernel density estimation. KDE is a nice technique in 1d, has a strong statistical support, and becomes hard to use for clustering in multiple dimensions.
Take a look at K-means clustering algorithm. This algorithm works really well for clustering one dimensional feature vectors. But K means clustering algorithm doesn't work very well when there are outliers in your training dataset in which case you can use some advanced machine learning algorithms.
I'd suggest that before implementing a machine learning algorithm (classification, clustering etc.) for your dataset and problem statement, you can use Weka Toolkit to check which algorithm best fits your problem statement. Weka toolkit is a collection of a large number of machine learning and data mining algorithms that can be easily implemented for a given question. Once you have identified which algorithm works best for your problem, you can modify or write your own implementation of the algorithm. By tweaking it, you can even achieve more accuracy. You can download weka from here.

Fourier transformation Algorithms

Please do bear with me if you find my query a little stupid. But I am currently doing a high school research project on how Fourier transformation can be used in recognizing human speech(similar to how Shazam works). But I need to two different Fast Fourier Transformation algorithms for this project. One of the algorithms I am using would definitely be the Cooley-Tukey FTT algorithm. However, I am unsure of another FTT algorithm I should use. Thus, what would be a good algorithm to use and is there any pseudo code/source code for that particular algorithm? I was only able to find algorithms for Cooley-Tukey thus far.
Thanks!
If you don't need speed (due to some performance constraints), then a DFT (straight matrix multiply) should produce very similar results (differing due to rounding noise) using a very different algorithm.

Best data mining algorithm for detect the profile of students with excellent grades

I have a dataset of students profiles (Age,sex,address...etc) with the performance note (1 the worst, 5 the best).
I would like to know what could be the best data mining algorithm to determine the profile of those students with a performance bigger than 4.
Until the moment, I have think in clustering algorithm (K-means...) bus these are unsupervised algorithms so it's difficult for to fix a cluster with 100% of probability of having a student with the performance wished. Do you have any suggestion? Is there a better algorithm to achieve the objectives? Thanks!!
This does not sound like a clustering problem to me.
Instead, you are looking for a decision tree, on the target variable "grade > 4".
Decision trees, Neural network, SVD can be applied to characterize high performance students. There is no guarantees of perfect classification. You can see the quality of the model based on the accuracy measures.

Statistically optimize a genetic algorithms selection operator

I am familiar with the methods of selection for genetic algorithms such as stochastic universal sampling, roulette wheel, tournament and others. However, I realize that these methods are close to random sampling used in statistics. I would like to know if there are implementation methods which are close to statistical clustering based on some features of individuals contained in the population, without having to first check all individuals for that specific feature before doing the sample. Essentially I would like to reduce the randomness of the other sampling methods while maintaining enough diversity in each population.
For the genetic algorithm generally, look for niching/crowding strategies. They try to preserve a diverse population by e.g. keeping unique or very diverse solutions and replacing solutions in very densly populated regions instead. This is especially useful in multiobjective optimization where the "solution" is a population of non-dominated individuals.
If you don't do multiobjective optimization and you do not need to maintain a diverse population over the whole run then you could also use the Offspring Selection Genetic Algorithm (OSGA). It is comparing children to its parents and only considering them for the next population if they've surpassed their parents in quality. This has been shown to a) work even with unbiased random parent selection and b) maintains the diversity until very late in the search at which point the population converges to a single solution.
You can for example use our software HeuristicLab, try different configurations of genetic algorithms and analyze their behavior. The software is GPL and runs on Windows.

Resources