I am trying to use Stanford's OpenIE (version 3.6.0) to extract relation triples based on a NER model I trained in chemistry domain. However, I couldn't have OpenIE to extract relation triples based on my own NER model. It seems OpenIE extracts relation triples based only on the default NER models provided in the package.
Below are what I've done to train and deploy my NER model:
Train the NER model based on http://nlp.stanford.edu/software/crf-faq.html#a.
Deploy the NER model in CoreNLP server and then restart the server. I modified the props attribute in corenlpserver.sh. The props attribute now looks like below:
props="-Dner.model=$scriptdir/my_own_chemistry.ser.gz,edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz"
Please take a look at an example NER + OpenIE results here. In this example, I expect that OpenIE builds the relation triples on the entities (such as Cl, Br, and Windjana) recoginized by my NER model, but it doesn't. Is it possible to have OpenIE extract relation triples based on a self-trained NER model? If so, would you please give me some breif instructions on how?
Thanks in advance!
Contacted the author of OpenIE, and the author confirmed that OpenIE more or less ignored NER altogether. Hope this can help others who have the same question.
Related
I want to finetune a BERT NER model and remove or add new labels.
For example,
I have these labels:
LOCATION MONEY ORGANIZATION PERSON PRODUCT TIME TVSHOW.
I want more labels or to remove labels while finetuning it. Is this possible? If it is not, what are the other solutions?
I could not find a solution.
BERT enabeles you to do this but you cannot use finetuned model. Such as we tried a finetuned BERTurk model to try this but the architecture of the model did not match with our labels so we decided to try BERTurk original model and it worked. I think BERT can be trained when it is not trained for downstream task such as NER.
I followed this Entities on my gazette are not recognized
Even after adding minimal example of training data "Damiano" in gazette entity i am not able to recognition John or Andrea as PERSON.
I tried this using on large training data and gazette but still not able to tag any gazette entity. why?
I want to train existing Stanford core-nlp's english-left3words-distsim.bin model with some more data which fits my use case. I want to assign custom tags for certain words like run will be a COMMAND.
Where can I get the training data set? I could follow something like model training
For the most part it is sections 0-18 of the WSJ Penn Treebank.
Link: https://catalog.ldc.upenn.edu/ldc99t42
We have some extra data sets we don't distribute as well that we add on to the WSJ data.
I'm using Stanford NER and I have some results with the entity "MISC" in the
4 class :Location, Person, Organization, Misc
but I don't know what really represent this entity, anyone know what is that entity ?
Thanks
MISC is a category from the CoNLL 2003 evaluation data which is typically used to develop NER models. Honestly I don't think there is any definition of MISC beyond "is a named entity" and "isn't PERSON, ORG, or LOC".
I found this description on spaCy:
"MISC: Miscellaneous entities, e.g., events, nationalities, products, or works of art."
for models recognizing PER, LOC, ORG, MISC.
I've been creating sentiment analysis models to use with Stanford CoreNLP, and I've been using the one with the highest F1 score in my java code, like so:
props.put("sentiment.model", "/path/to/model-0014-93.73.ser.gz.");
But if I remove this line, what does CoreNLP use to score the data? Is there a default coreNLP model that's used if the user does not specify a model?
If no model is given, it'll use the default model included in the release trained on the Stanford Sentiment Treebank: http://nlp.stanford.edu/sentiment/treebank.html