New entities discovery from text - stanford-nlp

i'm working on new entities discovery from text and was wondering if stanford nlp can be used for this purpose ?
Actually what i know is that stanford requires trained classifiers to recognize entities but if i'm not wrong it will only detect already known entities for example if your models contains "stanford is a good university" and stanford is already a know entity, if i try "fooo is a good university" it won't recognize it as a new entity

This project should be of interest to you:
http://nlp.stanford.edu/software/patternslearning.shtml

OK - if javascript is fine for you (node.js/browser) please see : http://github.com/redaktor/nlp_compromise/
This is a "No training" solution. I worked especially on NER (named entity extraction) the last days - just described it here Named entity recognition with a small data set (corpus)
Feel free to ask me about it in the github issues because I did not document the new methods (no time and still working on it)

Related

Luis preBuilt entity personName - unexpected behaviour, missing basic name in utterance

In summary, why does Luis not label the preBuilt entity personName in some cases. Often the second name is not labeled for no discernible reason.
This behaviour does not exist for say geography preBuilt entity with the same kinds of utterances.
If anyone can explain why this happens and how best to address it I'd greatly appreciate it.
This simply dose not make sense to me. I would love to understand more.
personName Image here.
Luis not labelling All personName correctly
geography example here (without the same issues as above)
geography example
Thanks. K.
The scenario you described is currently the expected behavior (it might extract some or all names). However, we are working on improving the built-in personName entity (currently on our road-map). However, we recommend using a machine learned entity where you label the instances of names in your dataset and using the personName entity as a feature to help create your own name entity. Sorry for the inconvenience, but hope this helps!

What is the difference of a machined learned entity with a list entity constraint vs using a list entity itself when using LUIS NLU entities?

In the v3 api for building LUIS apps I notice an emphasis on Machined learned entities. When working with them I notice something that concerns me and I was hoping to get more insight into the matter.
The idea is that when using a machined learned entity you can bind it to descriptors of phrase lists or other entities or list entities as a constraint on that machined learned entity. Why not just aim to extract the list entity by itself? What is the purpose of wrapping it in a machined learnt object?
I ask this because I have always had great success with lists. It very controllable albeit you need to watch for spelling mistakes and variations to assure accuracy. However, when I use machined learnt entities I notice you have to be more careful with word order. If there is a variation it could not pick up that machined learnt entity.
Now training would fix this but in reality if I know I have the intent I want and I just need entities from that what really does the machine learnt entity provide?
It seems you need to be more careful with it.
Now I say this with this suspicion. Would the answer lie in the fact that a machine learnt entity would increase intent detection where a list entity would only serve to increase entity detection. If that is the answer that most fits I think I can see the solution to what it is I am looking for.
EDITED:
I haven't been keeping up with LUIS ever since I went on maternity leave, and lo and behold, it's moving from V2 to V3!
The following shows an email conversation from a writer of the LUIS team's documentation.
LUIS is moving away from different types of entities toward a single ML entity to encapsulate a concept. The ML entity can have children which are ML entities themselves. An ML entity can have a feature directly connected to it, instead of acting as a global feature.
This feature can be a phrase list, or it can be another model such as a prebuilt entity, reg ex entity, or list entity.
So a year ago a customer might have built a composite entity and thrown features into the app. Now they should create an ML entity with children, and these children should have features.
Now (before //MS Build Conference) any non-phrase-list feature can be a constraint (required) so a child entity with a constrained regex entity won’t fire until the regex matches.
At/after //Build, this concept has been reworked in the UI to be a required feature – same idea but different terminology.
...
This is about understanding a whole concept that has parts, so an address is a typical example. An address has street number, street name, street type (street/court/blvd), city, state/province, country, postal code.
Each of the subparts is a feature (strong indicator) that an address is in the utterance.
If you used a list entity but not as a required feature to the address, yes it would trigger, but that wouldn’t help the address entity which is what you are really trying to get.
If however, you really just want to match a list, go head. But when the customer says the app isn’t predicting as well as they thought, the team will come back to this concept of the ML entity parent and its parts and suggest changes to the entities.

With the Simple API in Stanford CoreNLP, is there a way to get multi-token entity mentions?

This question is very similar to my question, however due to the way SO works, I think it is better to ask a new question rather than just continue a thread.
CoreNLP has the Simple API which allows for quicker access to various components of the NLP pipeline. The way to get named entities appears to be:
Form a document annotation from the text
Get the sentences from the document object
Use nerTags() from the sentences object to get the token-by-token ner labeling.
Via other mechanisms, as talked about in the question link above, one can retrieve full multi-token entity mentions such as George Washington, which is an entity mention composed of 2 tokens. Is there a way using the simple api to get these multi-token entity mentions?
Yes, though it gives you less information than the full API, returning only the String spans of the mention. See Sentence#mentions(String) and Sentence#mentions().
If you want to get more information about a mention, you'll have to either use the regular API, or re-implement the logic in these functions. You can also try mucking around in the raw Proto, which will certainly have all the information you could possibly want, but in a less-than-pleasant proto interface. The proto definition is here.

Stanford Ner : build my own model or use RegexNer?

I would like some advices for Stanford NER, I'm wondering what it is the best way to detect new entities :
Use RegexNer to detect new entities ?
Train my own NER model with new entities ?
Thank you in advance.
If you can easily generate a large list of the type of entity you want to tag, I would suggest using RegexNER. For instance if you were trying to tag sports teams, it would probably be easier to just compile a large list of sports team names and directly match. Building a large training set can take a lot of effort.

Using Alloy Models

I'm working on a project about the live upgrade of HA applications in SA
Forum middleware.
in Part of my research, I need to make a UML profile for my input upgrade campaign file,
and validate that file regarding some dependency constraints. Now I want to use ALLOY
instead of UML in my work specially since it's more abstract and formal than UML. (of
course UML + OCL will be formal.). Now my question is that, if UML + OCL is formal so
what's the benefit of using the ALLOY?
In general what are the benefits of using Alloy against UML?
As far as I know, there are no tools that let you check your OCL constraints against the UML
model, and generate and visualize valid instances, so if you are planning to do formal analysis of your models + specifications, Alloy might be a better choice. Even if you're not planning to do much of analysis, Alloy's ability to generate and visualize valid instances is greatly helpful in making sure you got your model and specification right.

Resources