Can anyone help me out to understand the evaluation summary while testing against a trained model? What is label confusion matrix and root label confusion matrix?
See http://en.wikipedia.org/wiki/Confusion_matrix . We track (mis-)classifications for each label / root label and use this table structure to report the results.
Related
I got some questions when tried to read and learn the Transformer paper "Attention is all you need":
Which parameters exactly are Tranformer model learned during training process since the attention weight matrix is temporarily calculated from "softmax(QKT/√dk)"? The only trained parameters i know are the linear transformation factor applied on input before entering Multi-head Attention and factors inside FFN. Is there any parameter else? I wish to have a clear and unambiguous summary please.
What is the role of FFN in this model? How does it process the data and why we need it? I wish to have a simple and direct explanation please.
Please forgive my grammar mistakes since English is not my native language. Thank you so much.
the parameters are the weights of linear layers refer to this question
take a look into this answer
I am new to deep learning and I hope you guys can help me.
The following site uses CNN features for multi-class classification:
https://www.mathworks.com/help/deeplearning/examples/feature-extraction-using-alexnet.html
This example extracts features from fully connected layer and the extracted features are fed to ECOC classifier.
In this example, regarding to the whole dataset, there are total 15 samples in each category and in the training dataset, there are 11 samples in each category.
My question are related to the dataset size: If I want to use cnn features for ECOC classification as above example, it must be required to have the number of samples in each category the same?
If so, would you like to explain why?
If not, would you like to show the reference papers which have used different numbers?
Thank you.
You may want to have a balanced dataset to prevent your model from learning a wrong probability distribution. If a category represents 95% of your dataset, a model that classifies everything as part of that category, will have an accuracy of 95%.
I have a list of labelled text. Some have one label, others have 2 and some have even 3. Do you treat this as a multi-class classification problem?
The type of classification problem to solve depends on what your goal is, id don't know exactly what type of problem you are trying to solve, but from the form of data i presume you are talking about a multi-label classification problem.
In any case let's make some clarifications:
Multi-class classification:
you can have many classes (dog,cat,bear, ...) but each sample can be assigned only to one class, a dog cannot be a cat.
Multi-label classfication
the goal of this approach is assigning a set of labels to samples, in the text classification scenario for example the phrase "Today is the weather is sunny" may be assigned the set of labels ["weather","good"].
So, if you need to assign each sample to one class only, based on some metric that for example can be tied to the labels, you should use a multi-class algorithm,
but if your goal is predicting the labels that are most appropriate for your sample (text tagging for ex.), then we are talking about a multi-label classification problem.
In Stanford sentiment analysis do we have an option for marking specific/custom words as positive[based on our requirements].
Analysing tweets is giving a negative trend due to the business terms used. Can we handle it to neutralize the negative output due to these words by adding our custom dictionary ?
The cleanest way to do this would be to retrain the sentiment model. Acquire the sentiment training data and manually modify the labels for the words you are concerned about. There are very basic instructions for training on another Stanford Sentiment page. Then use this trained model as you wish!
A very dirty but possibly faster solution would be to modify the trees you get from the standard model after the fact. For example, you'd search an analyzed tree for words of interest and manually modify their sentiment label. Then apply some heuristic in order to propagate this modification up the tree and possibly alter the sentiment of the whole sentence.
Hey, Here is my problem,
Given a set of documents I need to assign each document to a predefined category.
I was going to use the n-gram approach to represent the text-content of each document and then train an SVM classifier on the training data that I have.
Correct me if I miss understood something please.
The problem now is that the categories should be dynamic. Meaning, my classifier should handle new training data with new category.
So for example, if I trained a classifier to classify a given document as category A, category B or category C, and then I was given new training data with category D. I should be able to incrementally train my classifier by providing it with the new training data for "category D".
To summarize, I do NOT want to combine the old training data (with 3 categories) and the new training data (with the new/unseen category) and train my classifier again. I want to train my classifier on the fly
Is this possible to implement with SVM ? if not, could u recommend me several classification algorithms ? or any book/paper that can help me.
Thanks in Advance.
Naive-Bayes is relatively fast incremental calssification algorithm.
KNN is also incremental by nature, and even simpler to implement and understand.
Both algorithms are implemented in the open source project Weka as NaiveBayes and IBk for KNN.
However, from personal experience - they are both vulnerable to large number of non-informative features (which is usually the case with text classification), and thus some kind of feature selection is usually used to squeeze better performance from these algorithms, which could be problematic to implement as incremental.
This blog post by Edwin Chen describes infinite mixture models to do clustering. I think this method supports automatically determining the number of clusters, but I am still trying to wrap my head all the way around it.
The class of algorithms that matches your criteria are called "Incremental Algorithms". There are incremental versions of almost any methods. The easiest to implement is naive bayes.