unable to tag gazette entities using own crf model - crf

I followed this Entities on my gazette are not recognized
Even after adding minimal example of training data "Damiano" in gazette entity i am not able to recognition John or Andrea as PERSON.
I tried this using on large training data and gazette but still not able to tag any gazette entity. why?

Related

Is there any best way to train a custom domain specific text summarization model?

I tried some pretrained summarization models from HuggingFace, like Bert, T5, Bart, etc. But, the summarized content doesn't extract some important content from the original data. I need to do an abstract summary and need to extract the relevant information from the original content.

How to export a Google AutoML Text Classification model?

I just finished training my AutoML Text Classification model (single-label).
I was planning to run a Batch Prediction using the console, but I just found out how expensive that will be because I have over 300,000 text records to analyze.
So now I want to export the model to my local machine and run the predictions there.
I found instructions here to export "AutoML Tabular Models" and "AutoML Edge Models". But there is nothing available for text classification models.
I tried following the "AutoML Tabular Model" instructions because that looked like the closest thing to a text classification model, but I could not find the "Export" button that was supposed to exist on the model detail page.
So I have some questions regarding this:
How do I export a AutoML Text Classification model?
Is a AutoML Text Classification model the same thing as an AutoML Tabular model? They seem very similar because my text classifiction model used tabular CSV to assign labels and train the model.
If I cannot export AutoML Text Classification model (urgh!), can I train a new "Tabular" model to do the same thing?
Currently, there is no feature to export an AutoML text classification model. Already a feature request exists, you can follow its progress on this issue tracker.
Both the models are quite similar. A tabular data classification model analyzes your tabular data and returns a list of categories that describe the data. A text data classification model analyzes text data and returns a list of categories that apply to the text found in the data. Refer to this doc for more information about AutoML model types.
Yes, you can do the same thing in an AutoML tabular data classification model if your training data is in tabular CSV file format. Refer to this doc for more information about how to prepare tabular training data.
If your model trained successfully in an AutoML tabular data classification, you can find an Export option at the top. Refer to this doc for more information about how to export tabular classification models.

Training LUIS to predict entities without Phrase List

I am trying to train LUIS to recognize entity through few utterances. I initially tried to train with few utterances with different entity values. The entity values are made up of two words or more. For example, 'customer engagement', 'empower your teams' etc.
I am not able to get LUIS to identify the entity correctly because of the variation in the number of words.
I cannot use Phrase List as the values as the values are dynamic.
How can I get train LUIS to recognize the multiple words in the utterance and identify the entity effectively?
This still requires you to provide some training data in the form of canonical values and synonyms, but another way to approach this would be to use a list entity inside of a composite entity. Other than this, you'll currently have to provide a larger amounts of training data/phrase list data as LUIS doesn't look at the definition of a word.

Training existing core nlp model

I want to train existing Stanford core-nlp's english-left3words-distsim.bin model with some more data which fits my use case. I want to assign custom tags for certain words like run will be a COMMAND.
Where can I get the training data set? I could follow something like model training
For the most part it is sections 0-18 of the WSJ Penn Treebank.
Link: https://catalog.ldc.upenn.edu/ldc99t42
We have some extra data sets we don't distribute as well that we add on to the WSJ data.

Stanford OpenIE using customized NER model

I am trying to use Stanford's OpenIE (version 3.6.0) to extract relation triples based on a NER model I trained in chemistry domain. However, I couldn't have OpenIE to extract relation triples based on my own NER model. It seems OpenIE extracts relation triples based only on the default NER models provided in the package.
Below are what I've done to train and deploy my NER model:
Train the NER model based on http://nlp.stanford.edu/software/crf-faq.html#a.
Deploy the NER model in CoreNLP server and then restart the server. I modified the props attribute in corenlpserver.sh. The props attribute now looks like below:
props="-Dner.model=$scriptdir/my_own_chemistry.ser.gz,edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz"
Please take a look at an example NER + OpenIE results here. In this example, I expect that OpenIE builds the relation triples on the entities (such as Cl, Br, and Windjana) recoginized by my NER model, but it doesn't. Is it possible to have OpenIE extract relation triples based on a self-trained NER model? If so, would you please give me some breif instructions on how?
Thanks in advance!
Contacted the author of OpenIE, and the author confirmed that OpenIE more or less ignored NER altogether. Hope this can help others who have the same question.

Resources