Literature on many-vs-many classifier - algorithm

In the context of Multi-Class Classification (MCC) problem,
a common approach is to build final solution from multiple binary classifiers.
Two composition strategy typically mentioned are one-vs-all and one-vs-one.
In order to distinguish the approach,
it is clearer to look at what each binary classifier attempt to do.
One-vs-all's primitive classifier attempt to separate just one class from the rest.
Whereas one-vs-one's primitive attempts to separate one against
One-vs-one is also, quite confusingly, called all-vs-all and all-pairs.
I want to investigate this rather simple idea of building
MCC classifier by composing binary classifier in
binary-decision-tree-like manner.
For an illustrative example:
has wings?
/ \
quack? nyan?
/ \ / \
duck bird cat dog
As you can see the has wings? does a 2-vs-2 classification,
so I am calling the approach many-vs-many.
The problem is, I don't know where to start reading.
Is there a good paper you would recommend?
To give a bit more context,
I'm considering using a multilevel evolutionary algorithm (MLEA) to build the tree.
So if there is an even more direct answer, it would be most welcomed.
Edit: For more context (and perhaps you might find it useful),
I read this paper which is one of the GECCO 2011 best paper winners;
It uses MLEA to compose MCC in one-vs-all manner.
This is what inspired me to look for a way to modify it as decision tree builder.

What you want looks very much like Decision Trees.
From wiki:
Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.

Sailesh's answer is correct in that what you intend to build is a decision tree. There are many algorithms already for learning such trees such as e.g. Random Forests. You could e.g. try weka and see what is available there.
If you're more interested in evolutionary algorithms, I want to mention Genetic Programming. You can try for example our implementation in HeuristicLab. It can deal with numeric classes and attempts to find a formula (tree) that maps each row to its respective class using e.g. mean squared error (MSE) as fitness function.
There are also instance-based classification methods like nearest neighbor or kernel-based methods like support vector machines. Instance-based method also support multiple classes, but with kernel-methods you have to use one of the approaches you mentioned.

Related

some confusions in machine learning

I have two confusions when I use machine learning algorithm. At first, I have to say that I just use it.
There are two categories A and B, if I want to pick as many as A from their mixture, what kind of algorithm should I use ( no need to consider the number of samples) . At first I thought it should be a classification algorithm. And I use for example boost decision tree in a package TMVA, but someone told me that BDT is a regression algorithm indeed.
I find when I have coarse data. If I analysis it ( do some combinations ...) before I throw it to BDT, the result is better than I throw the coarse data into BDT. Since the coarse data contains every information, why do I need analysis it myself?
Is you are not clear, please just add a comment. And hope you can give me any advise.
For 2, you have to perform some manipulation on data and feed it to perform better because from it is not built into algorithm to analyze. It only looks at data and classifies. The problem of analysis as you put it is called feature selection or feature engineering and it has to be done by hand (of course unless you are using some kind of technique that learns features eg. deep learning). In machine learning, it has been seen a lot of times that manipulated/engineered features perform better than raw features.
For 1, I think BDT can be used for regression as well as classification. This looks like a classification problem (to choose or not to choose). Hence you should use a classification algorithm
Are you sure ML is the approach for your problem? In case it is, some classification algorithms would be:
logistic regression, neural networks, support vector machines,desicion trees just to name a few.

Human-interpretable supervised machine learning algorithm

I'm looking for a supervised machine learning algorithm that would produce transparent rules or definitions that can be easily interpreted by a human.
Most algorithms that I work with (SVMs, random forests, PLS-DA) are not very transparent. That is, you can hardly summarize the models in a table in a publication aimed at a non-computer scientist audience. What authors usually do is, for example, publish a list of variables that are important based on some criterion (for example, Gini index or mean decrease of accuracy in the case of RF), and sometimes improve this list by indicating how these variables differ between the classes in question.
What I am looking is a relatively simple output of the style "if (any of the variables V1-V10 > median or any of the variables V11-V20 < 1st quartile) and variable V21-V30 > 3rd quartile, then class A".
Is there any such thing around?
Just to constraint my question a bit: I am working with highly multidimensional data sets (tens of thousands to hundreds of thousands of often colinear variables). So for example regression trees would not be a good idea (I think).
You sound like you are describing decision trees. Why would regression trees not be a good choice? Maybe not optimal, but they work, and those are the most directly interpretable models. Anything that works on continuous values works on ordinal values.
There's a tension between wanting an accurate classifier, and wanting a simple and explainable model. You could build a random decision forest model, and constrain it in several ways to make it more interpretable:
Small max depth
High minimum information gain
Prune the tree
Only train on "understandable" features
Quantize/round decision threhsolds
The model won't be as good, necessarily.
You can find interesting research in the understanding AI methods done by Been Kim at Google Brain.

How can we classify tree data structurse?

There are various types of trees I know. For example, binary trees can be classified as binary search trees, two trees, etc.
Can anyone give me a complete classification of all the trees in computer science?
Please provide me with reliable references or web links.
It's virtually impossible to answer this question since there are essentially arbitrarily many different ways of using trees. The issue is that a tree is a structure - it's a way of showing how various pieces of data are linked to one another - and what you're asking for is every possible way of interpreting the meaning of that structure. This would be similar, for example, to asking for all uses of calculus in engineering; calculus is a tool with which you can solve an enormous class of problems, but there's no concise way to explain all possible uses of the integral because in each application it is used a different way.
In the case of trees, I've found that there are thousands of research papers describing different tree structures and ways of using trees to solve problems. They arise in string processing, genomics, computational geometry, theory of computation, artificial intelligence, optimization, operating systems, networking, compilers, and a whole host of other areas. In each of these domains they're used to encode specific structures that are domain-specific and difficult to understand without specialized knowledge of the field. No one reference can cover all these ares in any reasonable depth.
In short, you seem to already know the structure of a tree, and this general notion is transferrable to any of the above domains. But to try to learn every possible way of using this structure or all its applications would be a Herculean undertaking that no one, not even the legendary Don Knuth, could ever hope to achieve in a lifetime.
Wikipedia has a nice compilation of the various trees at the bottom of the page
Dictionary of Algorithms and Data Structures has more information
What specifics are you looking for?

Algorithms behind Algorithmic Tree or Plant Growing

What are all the algorithms involved in Farmville game, specifically I am interested in drawing trees that has fruits based on user's activities.
I am into a project which has a specific need to draw a tree-type image in SVG. I am not sure how to go about the algorithms to define the tree and based on certain business rules the leafs in the tree grows etc., I think you get the idea. Farmville is just an example I took to explain.
Any help is greatly appreciated..
The comments above show the case for a simple sprite based tree. This is what most systems will use. I fail to see how business rules apply - perhaps you also need a factory interface factory ;).
If you are actually interested in programatically generating natural systems, I suggest looking at L-systems. The Algorithmic Beauty of Plants is also a fantastic reference book (made available as a PDF as its out of print)

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm?

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm? Why or why not?
Basically, I'm trying to figure out why the Wikipedia page for Statistical Classification does not mention LSI. I'm just getting into this stuff and I'm trying to see how all the different approaches for classifying something relate to one another.
No, they're not quite the same. Statistical classification is intended to separate items into categories as cleanly as possible -- to make a clean decision about whether item X is more like the items in group A or group B, for example.
LSI is intended to show the degree to which items are similar or different and, primarily, find items that show a degree of similarity to an specified item. While this is similar, it's not quite the same.
LSI/LSA is eventually a technique for dimensionality reduction, and usually is coupled with a nearest neighbor algorithm to make it a into classification system. Hence in itself, its only a way of "indexing" the data in lower dimension using SVD.
Have you read about LSI on Wikipedia ? It says it uses matrix factorization (SVD), which in turn is sometimes used in classification.
The primary distinction in machine learning is between "supervised" and "unsupervised" modeling.
Usually the words "statistical classification" refer to supervised models, but not always.
With supervised methods the training set contains a "ground-truth" label that you build a model to predict. When you evaluate the model, the goal is to predict the best guess at (or probability distribution of) the true label, which you will not have at time of evaluation. Often there's a performance metric and it's quite clear what the right vs wrong answer is.
Unsupervised classification methods attempt to cluster a large number of data points which may appear to vary in complicated ways into a smaller number of "similar" categories. Data in each category ought to be similar in some kind of 'interesting' or 'deep' way. Since there is no "ground truth" you can't evaluate 'right or wrong', but 'more' vs 'less' interesting or useful.
Similarly evaluation time you can place new examples into potentially one of the clusters (crisp classification) or give some kind of weighting quantifying how similar or different looks like the "archetype" of the cluster.
So in some ways supervised and unsupervised models can yield something which is a "prediction", prediction of class/cluster label, but they are intrinsically different.
Often the goal of an unsupervised model is to provide more intelligent and powerfully compact inputs for a subsequent supervised model.

Resources