Stanford NER custom model accuracy testing

Stanford NER custom model accuracy testing - stanford-nlp

I am working on Entity extraction using a custom model. I trained my CRF based model on a large dataset as
java -Xmx16g stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop ner.prop
using these features
Property file (ner.prop)
trainFile = training_data_IOB.tsv
#serializeTo = ner-model.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
qnSize=10
entitySubclassification=IOB1
retainEntitySubclassification=true
mergeTags=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true
useGazettes=true
gazette=gazetter.txt
sloppyGazette=true
Training file (training_data_IOB.tsv)
Thousands O
of O
demonstrators O
have O
marched O
through O
London B-LOC
to O
protest O
the O
war O
in O
Iraq B-LOC
... ...
Gazette file(gazetter.txt)
B-LOC Iraq
B-LOC Afghanistan
B-ORG Congressional
B-LOC Bangladesh
B-LOC Canada
B-ORG ...
the new model is created as ner-model.ser.gz and working quite well.
Now my question is, How I can calculate its percentage accuracy on any unseen(new) data without any manual counting and calculations??
I'm new in this field so kindly post a detailed descriptive answer. Thanks for your time.

If you create a conll file with the gold tags for a test set you can use this command and it will output scores (this example runs our model, replace with your custom model):
java -Xmx2g edu.stanford.nlp.ie.crf.CRFClassifier -loadclassifier edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz -testFile testData.conll

Related

Saving bert model at every epoch for further training

I am using bert_model.save_pretrained for saving the model at end as this is the command that helps in saving the model with all configurations and weights but this cannot be used in model.fit command as in callbacks saving model at each epoch does not save with save_pretrained. Can anybody help me in saving bert model at each epoch since i cannot train whole bert model in one go?
Edit
Code for loading pre trained bert model
bert_model = TFAutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=num_classes)
Code for compiling the bert model
from tensorflow.keras import optimizers
bert_model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(learning_rate=0.00005),
metrics=['accuracy'])
bert_model.summary()
Code for training and saving the bert model
checkpoint_filepath_1 = 'callbacks_models/BERT1.{epoch:02d}-
{val_loss:.2f}.h5'
checkpoint_filepath_2 = 'callbacks_models/complete_best_BERT_model_1.h5'
callbacks_1 = ModelCheckpoint(
filepath=checkpoint_filepath_1,
monitor='val_loss',
mode='min',
save_best_only=False,
save_weights_only=False,
save_freq='epoch')
callbacks_2 = ModelCheckpoint(
filepath=checkpoint_filepath_2,
monitor='val_loss',
mode='min',
save_best_only=True)
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,
patience=5)
hist = bert_model.fit([train1_input_ids, train1_attention_masks],
y_train1, batch_size=16, epochs=1,validation_data=
([val_input_ids, val_attention_masks], y_val),
callbacks
[es,callbacks_1,callbacks_2,history_logger])
min_val_score = min(hist.history['val_loss'])
print ("\nMinimum validation loss = ", min_val_score)
bert_model.save_pretrained("callbacks_models/Complete_BERT_model_1.h5")

Good training and validation accuracy but poor confusion matrix

I have training my model to detect normal vs pneumonia chest x-ray classes. This is my dataset as listed below:
train_batch= ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input)\
.flow_from_directory(directory=train_path, target_size=(224,224), classes=['NORMAL', 'PNEUMONIA'],
batch_size=32,class_mode='categorical')
val_batch= ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(directory=val_path, target_size=(224,224), classes=['NORMAL', 'PNEUMONIA'], batch_size=32, class_mode='categorical')
test_batch= ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(directory=test_path, target_size=(224,224), classes=['NORMAL', 'PNEUMONIA'], batch_size=16,class_mode='categorical', shuffle=False)
Found 3616 images belonging to 2 classes. #training
Found 1616 images belonging to 2 classes. #validation
Found 624 images belonging to 2 classes. #test
my model consist of 5 CNN layers where image w,h = (224* 224,3) with 16 feature map as first layer and then 32, 64, 128,256. Batch normalization , max pooling and dropout is added to every cnn layer, but last dense layer is as follow
model.add(Dense(units=2 , activation='softmax'))
optim = Adam( lr=0.001 )
model.compile(optimizer=optim , loss= 'categorical_crossentropy' , metrics= ['accuracy'])
history=model.fit_generator(train_batch,
steps_per_epoch= 113, #3616/32=113
epochs = 25,
validation_data = val_batch,
validation_steps = 51 #1616/32=51
#verbose=2
#callbacks=callbacks #remove to chk
)
as it can be seen in the graph that my training and validation accuracy and loss is good but when I plot confusion matrix it dose not seems good why??
prediction = model.predict_generator(test_batch,steps= stepss) #, verbose=0)
prediction1 = np.argmax(prediction, axis=1)
cm = confusion_matrix (test_batch.classes, prediction1)
print(cm)
this is my confusion matrix as below
as you can see my graph which is as below
after that I did fine tuning of my model with VGG!6 by replacing last dense layer with my own dense layer with two outputs and here is the graph and confusion matrix:
I do not understand why my testing in not going good even with vgg16 model as you can see the results so please give me your valuable suggestions THANKS

How to "debug" StanfordNLP text classifier

I'm using StanfordNLP to do text classification. I have a training set with two labels: YES and NO. Both labels have more or less the same datums per label (~= 120K).
The problem is that StanfordNLP is misclassifying some text, and I'm not able to identify why. How do I debug it?
My train file look like:
YES guarda-roupa/roupeiro 2 portas de correr
YES guarda-roupa/roupeiro 3 portas
YES guarda roupa , roupeiro 3 portas
YES guarda-roupa 4 portas
YES guarda roupa 6p mdf
YES guardaroupas 3 portas
YES jogo de quarto com guarda-roupa 2 portas + cômoda + berço
YES guarda roupa 4pts
NO base para guarda-sol
NO guarda-sol alumínio
NO guarda chuva transparente
NO coifa guarda po alavanca cambio
NO lancheira guarda do leao vermelha
NO hard boiled: queima roupa
NO roupa nova do imperador
NO suporte para passar roupa
The YES label identifies "guarda roupa" (wardrobe) and NO identifies things that aren't "guarda roupa" but have one or more commons words (such as "guarda chuva" -- umbrella, or "roupa" -- clothes).
I don't know why, but my model insists to classify "guarda roupa" (and its variations such as "guardaroupa", "guarda-roupas", etc) as NO...
How do I debug it? I already double checked my train file in order to see if I misclassified something, introducing an error, but I could not find it...
Any advice is welcome.
UPDATE 1
I'm using the following properties in order to control features creation:
useClassFeature=false
featureMinimumSupport=2
lowercase=true
1.useNGrams=false
1.usePrefixSuffixNGrams=false
1.splitWordsRegexp=\\s+
1.useSplitWordNGrams=true
1.minWordNGramLeng=2
1.maxWordNGramLeng=5
1.useAllSplitWordPairs=true
1.useAllSplitWordTriples=true
goldAnswerColumn=0
displayedColumn=1
intern=true
sigma=1
useQN=true
QNsize=10
tolerance=1e-4
UPDATE 2
Searching the API, I discovered that ColumnDataClassifier has a method getClassifier() that gives access to the underlying LinearClassifier, which has a dump() method. The dump produces an output that looks like bellow. From API: "Print all features in the classifier and the weight that they assign to each class."
YES NO
1-SW#-guarda-roupa-roupeiro-2portas 0,01 -0,01
1-ASWT-guarda-roupa-roupeiro 0,19 -0,19
1-SW#-guarda-roupa-roupeiro 0,19 -0,19
If I do a toString() into LinearClassifier it will print:
[-0.7, -0.7+0.1): 427.0 [(1-SW#-guarda-roupa-roupeiro-2portas,NO), ...]
[0.6, 0.6+0.1): 427.0 [(1-SW#-guarda-roupa-roupeiro-2portas,YES), ...]

How to get time and date or specific product name using NLTK?

doc = '''Andrew Yan-Tak Ng is a Chinese American computer scientist.He is the former chief scientist at Baidu, where he led the company's
Artificial Intelligence Group. He is an adjunct professor (formerly associate professor) at Stanford University. Ng is also the co-founder
and chairman at Coursera, an online education platform. Andrew was born in the UK on 27th Sep 2.30pm 1976. His parents were both from Hong Kong.'''
# tokenize doc
tokenized_doc = nltk.word_tokenize (doc)
# tag sentences and use nltk's Named Entity Chunker
tagged_sentences = nltk.pos_tag (tokenized_doc)
ne_chunked_sents = nltk.ne_chunk (tagged_sentences)
When you process and extract chucks..I see we only get
[('Andrew', 'PERSON'), ('Chinese', 'GPE'), ('American', 'GPE'), ('Baidu', 'ORGANIZATION'), ("company's Artificial Intelligence Group", 'ORGANIZATION'), ('Stanford University', 'ORGANIZATION'), ('Coursera', 'ORGANIZATION'), ('Andrew', 'PERSON'), ('UK', 'ORGANIZATION'), ('Hong Kong', 'GPE')]
I need to get the time and date too?
Please suggest...
Thank you.

You need a more sophisticated tagger like the Stanford's Named Entity Tagger. Once you have it installed and configured, you can run it:
from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize
stanfordClassifier = '/path/to/classifier/classifiers/english.muc.7class.distsim.crf.ser.gz'
stanfordNerPath = '/path/to/jar/stanford-ner/stanford-ner.jar'
st = StanfordNERTagger(stanfordClassifier, stanfordNerPath, encoding='utf8')
doc = '''Andrew Yan-Tak Ng is a Chinese American computer scientist.He is the former chief scientist at Baidu, where he led the company's Artificial Intelligence Group. He is an adjunct professor (formerly associate professor) at Stanford University. Ng is also the co-founder and chairman at Coursera, an online education platform. Andrew was born in the UK on 27th Sep 2.30pm 1976. His parents were both from Hong Kong.'''
result = st.tag(word_tokenize(doc))
date_word_tags = [wt for wt in result if wt[1] == 'DATE' or wt[1] == 'ORGANIZATION']
print date_word_tags
Where the output would be:
[(u'Artificial', u'ORGANIZATION'), (u'Intelligence', u'ORGANIZATION'), (u'Group', u'ORGANIZATION'), (u'Stanford', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'Coursera', u'ORGANIZATION'), (u'27th', u'DATE'), (u'Sep', u'DATE'), (u'2.30pm', u'DATE'), (u'1976', u'DATE')]
You will probably run into some issues when trying to install and set up everything, but I think it's worth the hassle.
Let me know if it helps.

H2O GLM model: saved MOJO's prediction is very different when running on the same validation data

I built a GLM model using H2O (ver 3.14) in R. Please note that the training data contains integers, and also many NA, which I use MeanImputation to handle them.
glm <- h2o.glm(
training_frame = train.truth,
x=getColNames(train.truth),
y="isFemale",
family = "binomial",
missing_values_handling = "MeanImputation",
seed = 1000000)
I then use a validation data set to look at the perf, and the Precision looks good to me:
h2o.performance(glm, newdata=valid.truth)%>% h2o.confusionMatrix()
Confusion Matrix (vertical: actual; across: predicted) for max f1 # threshold = 0.529384526696015:
0 1 Error Rate
0 41962 300 0.007099 =300/42262
1 863 13460 0.060253 =863/14323
Totals 42825 13760 0.020553 =1163/56585
I then saved the model as a MOJO:
h2o.download_mojo(glm, path="models/mojo", get_genmodel_jar=TRUE)
I exported the validation DF to a CSV file:
dt.valid <- data.table(as.data.frame(valid.truth))
write.table(dt.valid, row.names = F, na="", file="models/test.csv")
I tried to use the saved mojo to do the same prediction by running this on my Linux shell:
java -cp h2o-genmodel.jar hex.genmodel.tools.PredictCsv \
--mojo GLM_model_R_1511161743608_15 \
--decimal --mojo GLM_model_R_1511161743608_15.zip \
--input ../test.csv --output output.csv
However, the result is terrible. All the records were predicted as 0, which is very different from what I got when I ran the model in R.
I have been stuck in this for a day but I couldn't figure out what went wrong. Anyone can shed some light on this?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Stanford NER custom model accuracy testing - stanford-nlp

Related

Saving bert model at every epoch for further training

Good training and validation accuracy but poor confusion matrix

How to "debug" StanfordNLP text classifier

How to get time and date or specific product name using NLTK?

H2O GLM model: saved MOJO's prediction is very different when running on the same validation data

Categories

Resources