Stanford NER - MISC Entity? - stanford-nlp

I'm using Stanford NER and I have some results with the entity "MISC" in the
4 class :Location, Person, Organization, Misc
but I don't know what really represent this entity, anyone know what is that entity ?
Thanks

MISC is a category from the CoNLL 2003 evaluation data which is typically used to develop NER models. Honestly I don't think there is any definition of MISC beyond "is a named entity" and "isn't PERSON, ORG, or LOC".

I found this description on spaCy:
"MISC: Miscellaneous entities, e.g., events, nationalities, products, or works of art."
for models recognizing PER, LOC, ORG, MISC.

Related

unable to tag gazette entities using own crf model

I followed this Entities on my gazette are not recognized
Even after adding minimal example of training data "Damiano" in gazette entity i am not able to recognition John or Andrea as PERSON.
I tried this using on large training data and gazette but still not able to tag any gazette entity. why?

How to modeling this relation in spring data neo4j?

Given I have two entity: Person and Company, and there are multiple relationships between them:
Person - Company:
The person can be the employee of the company
The person can be the shareholder of the company
The person can be the legal person of the company
Company - Company:
The company can be the legal of the company
The company can be the shareholder of the company
So how to modeling this in spring data neo4j?
What I tried is make 3 relationship types: EMPLOY, INVEST, LEGAL, each relationship type with the Company as the StartNode and the person as the EndNode, then in company and person, keep these relationships with the "UNDIRECTED" direction, just same as the diagram present, but always get the stackoverflow error when saving and searching.
Yes, now here is the solution in github, all the classes are house in sample.spring.data.neo4j package, and the the corresponding test sample.spring.data.neo4j.repositories.CompanyRepositoryTest
The big issue at the beginning is it always throws the StackOverFlow exception which is due the the lombok annotation, after remove all the lombok annotations and use the plain getter/setter, everything goes well.

Stanford OpenIE using customized NER model

I am trying to use Stanford's OpenIE (version 3.6.0) to extract relation triples based on a NER model I trained in chemistry domain. However, I couldn't have OpenIE to extract relation triples based on my own NER model. It seems OpenIE extracts relation triples based only on the default NER models provided in the package.
Below are what I've done to train and deploy my NER model:
Train the NER model based on http://nlp.stanford.edu/software/crf-faq.html#a.
Deploy the NER model in CoreNLP server and then restart the server. I modified the props attribute in corenlpserver.sh. The props attribute now looks like below:
props="-Dner.model=$scriptdir/my_own_chemistry.ser.gz,edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz"
Please take a look at an example NER + OpenIE results here. In this example, I expect that OpenIE builds the relation triples on the entities (such as Cl, Br, and Windjana) recoginized by my NER model, but it doesn't. Is it possible to have OpenIE extract relation triples based on a self-trained NER model? If so, would you please give me some breif instructions on how?
Thanks in advance!
Contacted the author of OpenIE, and the author confirmed that OpenIE more or less ignored NER altogether. Hope this can help others who have the same question.

How to classify documents with stanford NLP

I want to classify news documents on the basis of type of content it has. For example, Sports, Politics, Entertainment etc. How i can do this using stanford- nlp? If possible, please share an example for the same.
This link should be of interest:
http://nlp.stanford.edu/software/classifier.shtml

HippoCMS translated documents with shared fields

I am evaluating HippoCMS and am trying to model a schema of Venues. I want to model a document that has non-translatable features such as telephoneNumber and emailAddress, plus translatable features such as description.
How do I model this in HippoCMS? How do I ensure that the non-translated fields are shared between the different translations, to avoid each translated document having its own copy of a value. Obviously no matter which language you are reading a site in, the telephoneNumber shouldn't change.
The only way I have found for the moment is to create a document called Venue and another document called VenueTranslation. Venue would contain the telephoneNumber and VenueTranslation would contain its description and a link back to the Venue document. There would then be VenueTranslation documents for each language.
Is this the correct approach?
That could work, but you will run into usability issues. I'd say it depends on how many venues you plan to enter into the system, how many languages you are targeting, and, in the end, how keen are your CMS users to pick the right Venue document for every VenueTranslation corresponding to a language. I can see how this will quickly become error prone and cumbersome, but I don't have the numbers.
Regarding the final question, it's not correct nor incorrect: it's just that since the granularity of the translations in Hippo is at the document level and not at the field level, you have to do it this way. Your model makes sense but is not well supported in the CMS. This use case is trivial in a CMS that supports the notion of translatable field.

Resources