Machine learning classifying algorithm with "unknown" class - algorithm

I understand that if I train a ML classifying algorithm on sample pictures of apples, pears and bananas, it will be able to classify new pictures in one of those three categories. But if I provide a picure of a car, it will also classify it in one of those three classes because it has nowhere else to go.
But is there a ML classifying algorithm that would be able to tell if a item/picture is not really beloning to any of the classes it was trained for? I know I could create a "unknown" class and train it on all sorts of pictures that are neither apples, pears or bananas, but the training set would need to be huge I assume. That does not sound very practical.

One way to do this can be found in this paper - https://arxiv.org/pdf/1511.06233.pdf
The paper also compares the result generated by simply putting the threshold on the final scores and the (OpenMax) technique proposed by the author.

You should look at One-class classification. This is the problem of learning membership to a class, as opposed to distinguishing between two classes. This is interesting if there are too few examples of a second class ("not-in-class", let's say), or the "not-in-class" class is not well defined.
Where this popped up for me once was classifying Wikipedia articles for being flawed in some way - since it was not clear that an article not flagged as flawed was really not flawed, one approach was one-class classification. I have to add though that for my problem this did not perform well, so you should compare performance with other solutions.

EDIT 02/2019:
I agree with the comments below that the following answer in its original form is not correct. You will absolutely need negative samples to provide some balance your training dataset, otherwise your model may not learn useful discriminators between positive and negative samples.
That being said, you do not need to train on every possible negative class, only those which may be present when you are performing inference. This is getting more into how you set the problem up and how you plan to use your trained model.
ORIGINAL ANSWER:
Most classification algorithms will output a classification along with a score/certainty measure which indicates how confident that algorithm is that the returned label is correct (based on some internal figuring, this is not an external accuracy evaluation).
If the score is below a certain threshold, you can have it output unknown rather than one of the known classes. There is no need to train with negative examples.

it certainly helps having a class with random pictures (without objects of your other classes you want to detect) labeled as UNKNOWN class. this will prevent lot's of false positives. this is also best practice. read here to see it used with AutoML: https://cloud.google.com/vision/automl/docs/prepare

Related

Negative Training Image Examples for CNN

I am using the Caffe framework for CNN training. My aim is to perform simple object recognition for a few basic object categories. Since pretrained networks are not an alternative for my proposed usage I prepared an own training- and testset with about 1000 images for each of 2 classes (say chairs and cars).
The results are quite good. If I present an yet unseen image of a chair it is likely classified as such, same for a car image. My problem is that the results on miscellaneous images that do not show any of these classes often shows a very high confidence (=1) for one random class (which is not surprising regarding the onesided training data but a problem for my application). I thought about different solutions:
1) Adding a third class with also about 1000 negative examples that shows any objects except a chair and a car.
2) Adding more object categories in general, just to let the network classify other objects as such and not any more as a chair or car (of course this would require much effort). Maybe also the broader prediction results would show a more uniform distribution at negative images, allowing to evaluate the target objects presence based on a threshold?
Because it was not much time-consuming to grab random images as negative examples from the internet, I already tested my first solution with about 1200 negative examples. It helped, but the problem remains, perhaps because it were just too few? My concern is that if I increment the number of negative examples, the imbalance of the number of examples for each class leads to less accurate detection of the original classes.
After some research I found one person with a similar problem, but there was no solution:
Convolutional Neural Networks with Caffe and NEGATIVE IMAGES
My question is: Has anyone had the same problem and knows how to deal with it? What way would you recommend, adding more negative examples or more object categories or do you have any other recommendation?
The problem is not unique to Caffe or ConvNets. Any Machine Learning technique runs this risk. In the end, all classifiers take a vector in some input space (usually very high-dimensional), which means they partition that input space. You've given examples of two partitions, which helps to estimate the boundary between the two, but only that boundary. Both partitions have very, very large boundaries, precisely because the input space is so high-dimensional.
ConvNets do try to tackle the high-dimensionality of image data by having fairly small convolution kernels. Realistic negative data helps in training those, and the label wouldn't really matter. You could even use the input image as goal (i.e. train it as an autoencoder) when training the convolution kernels.
One general reason why you don't want to lump all counterexamples is because they may be too varied. If you have a class A with some feature value from the range [-1,+1] on some scale, with counterexamples B [-2,-1] and C [+1,+2], lumping B and C together creates a range [-2,+2] for counterexamples which overlaps the real real range. Given enough data and powerful enough classifiers, this is not fatal, but for instance an SVM can fail badly on this.

algorithm to combine data for linear fit?

I'm not sure if this is the best place to ask this, but you guys have been helpful with plenty of my CS homework in the past so I figure I'll give it a shot.
I'm looking for an algorithm to blindly combine several dependent variables into an index that produces the best linear fit with an external variable. Basically, it would combine the dependent variables using different mathematical operators, include or not include each one, etc. until an index is developed that best correlates with my external variable.
Has anyone seen/heard of something like this before? Even if you could point me in the right direction or to the right place to ask, I would appreciate it. Thanks.
Sounds like you're trying to do Multivariate Linear Regression or Multiple Regression. The simplest method (Read: less accurate) to do this is to individually compute the linear regression lines of each of the component variables and then do a weighted average of each of the lines. Beyond that I am afraid I will be of little help.
This appears to be simple linear regression using multiple explanatory variables. As the implication here is that you are using a computational approach you could do something as simple apply a linear model to your data using every possible combination of your explanatory variables that you have (whether you want to include interaction effects is your choice), choose a goodness of fit measure (R^2 being just one example) and use that to rank the fit of each model you fit?? The quality of a model is also somewhat subjective in many fields - you could reject a model containing 15 variables if it only moderately improves the fit over a far simpler model just containing 3 variables. If you have not read it already I don't doubt that you will find many useful suggestions in the following text :
Draper, N.R. and Smith, H. (1998).Applied Regression Analysis Wiley Series in Probability and Statistics
You might also try doing a google for the LASSO method of model selection.
The thing you're asking for is essentially the entirety of regression analysis.
this is what linear regression does, and this is a good portion of what "machine learning" does (machine learning is basically just a name for more complicated regression and classification algorithms). There are hundreds or thousands of different approaches with various tradeoffs, but the basic ones frequently work quite well.
If you want to learn more, the coursera course on machine learning is a great place to get a deeper understanding of this.

Does fuzzy logic really improve simple machine learning algorithms?

I'm reading about fuzzy logic and I just don't see how it would possibly improve machine learning algorithms in most instances (which it seems to be applied to relatively often).
Take for example, k nearest neighbors. If you have a bunch a bunch of attributes like color: [red,blue,green,orange], temperature: [real number], shape: [round, square, triangle], you can't really fuzzify any of these except for the real numbered attribute (please correct me if I'm wrong), and I don't see how this can improve anything more than bucketing things together.
How can machine fuzzy logic be used to improve machine learning? The toy examples you'll find on most websites don't seem to be all that applicable, most of the time.
Fuzzy logic is advisable when the variables have a natural shape interpretation. For example, [very few, few, many, very many] have a nice overlapping trapezoid interpretation of values.
Variables like color might not. Fuzzy variables denote degree of membership, that's when they become useful.
Regarding machine learning, it depends on what stage of the algorithm you want to apply fuzzy logic. It would be better applied in my opinion after the clusters are found (using traditional learning techniques) to determining the degree of membership of a certain point in the search space on each cluster, but that doesn't improve learning per see, but classification after learning.
[round, square, triangle] are mostly ideal categories, which exist primarily in geometry (i.e. in theory). In real world, some shapes might be almost square or more or less round (circular shape). There are many nuances of red, and some colors are closer to some others (ask a woman to explain turquoise, for example). Hence, also abstract categories and some specific values are useful as references, in real world the objects or values are not necessarily equals to these ones.
Fuzzy membership allow you to measure how far are some specific objects from some ideal. Using this measure lets one to avoid "no, it's not circular" (which might lead to information loss) and make use of the measure the given object is (not) circular.
In my view, fuzzy logic is not a practically viable approach to anything unless you are building a purpose build fuzzified controller or some rule based structure like for compliance/policies. Although, fuzzy implies dealing with everything between and including 0 and 1. It, however, I find is a bit flawed when you approach more complicated problems where you need to apply fuzzy logic aspects in 3 dimensional spaces. You can still approach multivariate without having to look at fuzzy logic. Unfortunately, for me having studied fuzzy logic I found myself disagreeing with the principles approached in fuzzy sets in large dimensional spaces it seems infeasible, unpractical, and not very logically sound. The natural language base that you would be applying in your fuzzy set solution will also be very adhoc what exactly is [very,few, many] this is all what you define in your application.
Alot, of machine learning aspects you will find that you don't even have to go so far as to build natural language underpinnings into your model. In fact, you will find you can achieve even better results without having to apply fuzzy logic into any aspect of your model.
just too irritate you a bit by forcibly adding fuzziness to this. if instead of the "shape" attribute you had a "number of sides" attribute which would have been further divided into "less", "medium", "many" and "uncountable". the square could have been a part of "less" and "medium" both given the appropriate membership function. in place of the "color" attribute, if you had "red" attribute, then using the RGB code, a membership function could have been made. so as my experience in data mining says, every method can be applied to every dataset, what works, works.
Couldn't one just convert discrete sets into continuous ones and get the same effects as fuzziness, while being able to use all the techniques of probability theory?
For instance size ['small', 'medium', 'big'] ==> [0,1]
It's not clear to me what you're trying to accomplish in the example you give (shapes, colors, etc.). Fuzzy logic has been used successfully with machine learning, but personally I think it is probably more often useful in constructing policies. Rather than go on about it, I refer you to an article I published in the Mar/Apr-2002 issue of "PC AI" magazine, which hopefully makes the idea clear:
Putting Fuzzy Logic to Work: An Introduction to Fuzzy Rules

What are techniques and practices on measuring data quality?

If I have a large set of data that describes physical 'things', how could I go about measuring how well that data fits the 'things' that it is supposed to represent?
An example would be if I have a crate holding 12 widgets, and I know each widget weighs 1 lb, there should be some data quality 'check' making sure the case weighs 13 lbs maybe.
Another example would be that if I have a lamp and an image representing that lamp, it should look like a lamp. Perhaps the image dimensions should have the same ratio of the lamp dimensions.
With the exception of images, my data is 99% text (which includes height, width, color...).
I've studied AI in school, but have done very little outside of that.
Are standard AI techniques the way to go? If so, how do I map a problem to an algorithm?
Are some languages easier at this than others? Do they have better libraries?
thanks.
Your question is somewhat open-ended, but it sounds like you want is what is known as a "classifier" in the field of machine learning.
In general, a classifier takes a piece of input and "classifies" it, ie: determines a category for the object. Many classifiers provide a probability with this determination, and some may even return multiple categories with probabilities on each.
Some examples of classifiers are bayes nets, neural nets, decision lists, and decision trees. Bayes nets are often used for spam classification. Emails are classified as either "spam" or "not spam" with a probability.
For you question you'd want to classify your objects as "high quality" or "not high quality".
The first thing you'll need is a bunch of training data. That is, a set of objects where you already know the correct classification. One way to obtain this could be to get a bunch of objects and classify them by hand. If there are too many objects for one person to classify you could feed them to Mechanical Turk.
Once you have your training data you'd then build your classifier. You'll need to figure out what attributes are important to your classification. You'll probably need to do some experimentation to see what works well. You then have your classifier learn from your training data.
One approach that's often used for testing is to split your training data into two sets. Train your classifier using one of the subsets, and then see how well it classifies the other (usually smaller) subset.
AI is one path, natural intelligence is another.
Your challenge is a perfect match to Amazon's Mechanical Turk. Divvy your data space up into extremely small verifiable atoms and assign them as HITs on Mechanical Turk. Have some overlap to give yourself a sense of HIT answer consistency.
There was a shop with a boatload of component CAD drawings that needed to be grouped by similarity. They broke it up and set it loose on Mechanical Turk to very satisfying results. I could google for hours and not find that link again.
See here for a related forum post.
This is a tough answer. For example, what defines a lamp? I could google images a picture of some crazy looking lamps. Or even, look up the definition of a lamp (http://dictionary.reference.com/dic?q=lamp). Theres no physical requirements of what a lamp must look like. Thats the crux of the AI problem.
As for data, you could setup Unit testing on the project to ensure that 12 widget() weighs less than 13 lbs in the widetBox(). Regardless, you need to have the data at hand to be able to test things like that.
I hope i was able to answer your question somewhat. Its a bit vauge, and my answers are broad, but hopefully it'll at least send you in a good direction.

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm?

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm? Why or why not?
Basically, I'm trying to figure out why the Wikipedia page for Statistical Classification does not mention LSI. I'm just getting into this stuff and I'm trying to see how all the different approaches for classifying something relate to one another.
No, they're not quite the same. Statistical classification is intended to separate items into categories as cleanly as possible -- to make a clean decision about whether item X is more like the items in group A or group B, for example.
LSI is intended to show the degree to which items are similar or different and, primarily, find items that show a degree of similarity to an specified item. While this is similar, it's not quite the same.
LSI/LSA is eventually a technique for dimensionality reduction, and usually is coupled with a nearest neighbor algorithm to make it a into classification system. Hence in itself, its only a way of "indexing" the data in lower dimension using SVD.
Have you read about LSI on Wikipedia ? It says it uses matrix factorization (SVD), which in turn is sometimes used in classification.
The primary distinction in machine learning is between "supervised" and "unsupervised" modeling.
Usually the words "statistical classification" refer to supervised models, but not always.
With supervised methods the training set contains a "ground-truth" label that you build a model to predict. When you evaluate the model, the goal is to predict the best guess at (or probability distribution of) the true label, which you will not have at time of evaluation. Often there's a performance metric and it's quite clear what the right vs wrong answer is.
Unsupervised classification methods attempt to cluster a large number of data points which may appear to vary in complicated ways into a smaller number of "similar" categories. Data in each category ought to be similar in some kind of 'interesting' or 'deep' way. Since there is no "ground truth" you can't evaluate 'right or wrong', but 'more' vs 'less' interesting or useful.
Similarly evaluation time you can place new examples into potentially one of the clusters (crisp classification) or give some kind of weighting quantifying how similar or different looks like the "archetype" of the cluster.
So in some ways supervised and unsupervised models can yield something which is a "prediction", prediction of class/cluster label, but they are intrinsically different.
Often the goal of an unsupervised model is to provide more intelligent and powerfully compact inputs for a subsequent supervised model.

Resources