How to add POS tag feature in OpenNLP named entity recognition tool - opennlp

I am trying to setup the OpenNLP NameFinder in a project with part-of-speech tag feature.
I extended my feature class from FeatureGeneratorAdapter class, and overrode following method. Unfortunately this method taking just raw tokens in parameter. The problem is that how to pass POS tag information in to this method?
public void createFeatures(List features, String[] tokens, int index, String[] previousOutcomes)

Try just passing in the pos as the tokens, ie append the pos to the word like this
bob_nn, went_vv etc....
the goal of the method in the interface is to return the "List features" ref back filled with the tokens so you may as well just put the pos_token combos straight into the list to begin with... never tried this before so hope this helps

Related

Iterate through tokens and find the entity for a token

Problem
After running CoreNLP over some text, I want to reconstruct a sentence adding the POS-tag for each Token and grouping the tokens that form an entity.
This could be easily done if there was a way to see which entity a Token belongs to.
Aproach
One option I was considering now was going through sentence.tokens() and finding the index in a list containing only the Tokens from all the CoreEntityMentions for that sentence. Then I could see which CoreEntityMention that Token belongs to, so I can group them.
Another option could be to look the offsets of each Token in the sentence and compare it to the offset of each CoreEntityMention.
I think the question is similar to what was asked here, but since it was a while ago, maybe the API has changed since.
This is the setup:
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner");
pipeline = new StanfordCoreNLP(props);
String text = "Some text with entities goes here";
CoreDocument coreDoc = new CoreDocument(text);
// annotate the document
pipeline.annotate(coreDoc);
for (CoreSentence sentence : coreDoc.sentences()) {
// Code goes here
List<CoreEntityMention> em : sentence.entityMentions();
}
Each token in an entity mention contains an index to which entity mention in the document it corresponds to.
cl.get(CoreAnnotations.EntityMentionIndexAnnotation.class);
I'll make a note to add a convenience method for this future versions.

Passing parameters to Liferay's services search function

I need to use Liferay's index for searching users that match a determined string, and this is possible by using UserLocalServiceImpl#search(long companyId, String keywords, int status, LinkedHashMap<String,Object> params, int start, int end, com.liferay.portal.kernel.search.Sort sort).
Furthermore I'd like to be able to filter the users by UserGroup.
I would expect that I could pass the userGroupId to this function into params, but it doesn't look like I can find any documentation about what params should be.
By taking a look at the source code it looks like it is being added to a SearchContext used to produce the query, but I'm not being able to follow down the code to the point where is then used.
Do anyone know anything about what can I put into params for this purpose?
I'm on a Liferay CE version 7.0
Please have a look add UserIndexer.java in method addContextQueryParams.
It looks like the param key should be usersGroups an it should hava a Long[] value. There are some other keys handled here, so you could gain some information how params are processed here (debugger?).

opennlp TokenNameFinder for entities different than names

I´m new to openNlp. I start training a model (TokenNameFinderTrainer), to identify names. So far so good, but now I want to identify organization (such as "Microsoft").
My question is: which types of entities does opennlp recognize by default? (if there is any
...)
I see that can handle <START:person> Daryl Williams <END> .
But is okay to create something like: <START:organization> Metro-Goldwyn-Mayer Studios Inc. <END>? or <START:company> Metro-Goldwyn-Mayer Studios Inc. <END>
Meaning: Can I label categories as I please? or
Do I have to use a default category for that?. That being the case, which are the default ones?
EDIT:
I have found the answers via further reading. I asking now for confirmation...
I can label entities as I please, and is wiser to make 1 model per entity, am I right there?.
For instance: 1 for locations, 1 for names, 1 for companies?
Any ideas on have to procede where the same (for instance) company is written like:
Microsoft, and also microsoft?
Thanks for the help!
you can make a model for any NER model you want, i recommend one model per type.
OpenNLP uses machine learning to find entities, so it will find what your model tells it to. So if you annotate microsoft and Microsoft, or even a misspelling of microsoft it will try to find it.
If you have a small list of names, and only a few variants for each, and you need the results to be normalized, consider using a RegexNameFinder. If you pull the trunk you can construct the RegexNameFinder with a Map that maps a label to a set of regex patterns.
EDIT: here is a link to the OpenNLP unit test cases for the RegexNameFinder. This is the 1.6-snapshot
http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/java/opennlp/tools/namefind/RegexNameFinderTest.java?view=co
if the link won't work, here is a basic example.
public void test() {
Pattern testPattern = Pattern.compile("test");
String sentence[] = new String[]{"a", "test", "b", "c"};
Pattern[] patterns = new Pattern[]{testPattern};
Map<String, Pattern[]> regexMap = new HashMap<>();
String type = "testtype";
regexMap.put(type, patterns);
RegexNameFinder finder =
new RegexNameFinder(regexMap);
Span[] result = finder.find(sentence);
}

retrieve the Quote Detail with c#

I'm trying to create a custom workflow (for Dynamics CRM 2011) which must send an email with information on the Details Quote from a quote.
I create it in Visual Studio 2010 with the sdk.
The workflow is triggered manualy from a quote.
I am able to retrieve the value of the customerid, but I am unable to get the attached documents or the quotedetails of the Quote, when I launched the workflow I have this exception :
System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
at Microsoft.Xrm.Sdk.Entity.get_Item(String attributeName)
at CPageCRM.Workflow.QuoteSendMailNotificationRIP.Execute(CodeActivityContext executionContext)
My code is :
//to get the current Quote
Entity preImageEntity = context.PreEntityImages.Values.FirstOrDefault();
//preImageEntity is a Quote because I trigger the workflow from a Quote
//the next two lines work, I can retrieve the good value of the Quote
string natureDevis = Utils.GetOptionSetValueLabel(service, preImageEntity, "new_nature", (OptionSetValue)preImageEntity["new_nature"]);
string prospectDevis = ((EntityReference)preImageEntity["customerid"]).Name;
//I get the exception after that :
List<QuoteDetail> listQuoteDetail = new List<QuoteDetail>();
listQuoteDetail = preImageEntity["quote_details"] as List<QuoteDetail>; //I get the exception
I don't understand why the quote_details doesn't exist in the dictionnary, because when I do :
Quote devis = new Quote();
devis.quote_details //<= (the autocompletion is working)
I have the same problem when I try to get sharepointdocumentlocation
Anyone have an explication? How can I retrieve the Quote Details and the document attached to my Quote from the code?
Thanks
A comment and potential answer.
My comment is when retrieving stuff out of the Images I often find it easier to let the compiler grab the proper type and just use 'var'.
My answer is that quote_details isn't just a field, but an actual 1-n relationshp (by looking in the metadata browser). You may need to get the related entities in a separate retrieve.
Edit:
For example: _service.Retrieve("quote", quoteId, new ColumnSet("quote_details"))
will retrieve the quote details from the service. However, you could also check and see if you are passing in the quote_details attribute from the PreImage.
I successed with a linq query
I had to search the quote_detail which were linked to the quote :
var queryQuoteDetail = from r in orgServiceContext.CreateQuery("quotedetail")
where ((EntityReference)r["quoteid"]).Id.Equals(context.PrimaryEntityId)
select r;

Qt: Using default model for selecting my data

I am quite new to Qt and am in a situation where I want to use a model for my needs:
I have a dynamic number of instances of a subclass that need to be handled differently (different UI controls for each if it is selected). I want to get a list view where I can add new elements or delete old ones, as well as disabling/enabling existing ones.
Of course I want to rewrite as least of the code as possible, so I thought of utilizing the Listwidget and a ListModel to give some controls to the user. But how to link these (or better the items) to instances of the classes?
Do you know any tutorials on this?
I already looked in QtDemo and Google but I do not know the right words to search for
so I had no good results.
Basically what I think I need is a model item that accepts Collider* for its data.
But when I plug this into QStandardItem.setData() it says error: ‘QVariant::QVariant(void*)’ is private
So I found the solution to this problem.
As QStandardItems are capable of storing QVariants as data I wanted to store a pointer to my data in a QVariant. To achieve this I had to use Q_DECLARE_METATYPE(MyType*).
With this I was able to
MyType *MyInstance = new MyType;
QVariant data;
data.setValue(MyInstance);
QStandardItem *item = new QStandardItem("My Item");
item->setData(data);
standardModel->appendRow(item);
And the best is you can add as many types you want and let QVariant do the work to decide if it contains the type you wanted:
if(v.canConvert<MyType*>())
//Yes it is MyType
else if( v.canConvert<MyOtherType*>())
//Oh it is the other one
So finally this only requires to declare the meta type so you do not have to subclass any items.
Also you should read on the limitations of this here:
Q_DECLARE_METATYPE
qRegisterMetaType
Does this page answer your questions? There's an example of deriving a StringListModel item that you might be able to use as a template

Resources