What features Apache OpenNLP does use by default while running its named entity recognition (NER) models? - opennlp

I know Apache OpenNLP uses MaxEnt model for its NER tagger. But what features Apache OpenNLP does use (by default) while running its named entity recognition (NER) models? and also how can we incorporate/customize new features in OpenNLP (Java implementation)?

In Apache OpenNLP NER, it allows users to define features via XML file. The default XML is this:
https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/resources/opennlp/tools/namefind/ner-default-features.xml
If you want to customize it, use -featuregen option when you train the model:
$ opennlp TokenNameFinderTrainer -featuregen your-features-definition.xml -model my-model.bin ...
You don't need to specify your customized feature XML file when you execute TokenNameFinder as the model file includes the information of your features.

Related

How to detect foul/offensive language using NLP? Are there any pre trained OpenNLP models available for that?

I am undergraduate. I need to detect foul/offensive language in a string. I am using Java Springboot framework and OpenNLP library. Are there any pre trained models for detect foul/offensive languages or How do I train a model for detect foul/offensive language?

Type Descriptor File for StanfordCore NLP for Apache UIMA RUTA

I am trying to annotate a text in German literature in Apache UIMA by writing rules in RUTA. I am using DKPro Core as well. I am very new to this and figuring out how to do stuff.
I am unable to get few annotations which are not mentioned in TypeDescriptor files as generated by the example German Novels (https://github.com/apache/uima-ruta/tree/trunk/example-projects/GermanNovels).
For example: ADJA tag in Partofspeech tagset that is available in Standford NLP POS tagger.
I searched for Typedescriptor file for StanfordCoreNLP but couldn't find on the net.
How can I generate these files?

Writing SonarQube sensor plugin with support for multiple languages

What is the correct way to add support for multiple programming languages to plugin?
Should a separate rules definition XMLs be loaded for each language with different repository keys and several implementations of ProfileDefinition created?
Is there an example of such plugin?
I am using SonarQube 5.2 snapshot
Rule repositories and Quality profiles are associated to a single language, so multiple instances of ProfileDefinition and RulesDefinition must be provided.

Want to use #Value (reading the properties from property file) in UIMA framework

I have a property file like myProperties.properties. I want to read one property like MAX_YEARS using spring annotation #value as like below in UIMA JCasAnnotator_ImplBase extending class.
private #Value("${REQUIRED_COLUMNS}") String requiredColumns;
Or any alternatives for reading properties from property file in UIMA framework.
Thanks in advance.
Narasimha.
UIMA does not support value injection via Java annotations (from Spring or any other DI frameworks) at this time. It does support External Configuration Parameter Overrides, though.
uimaFIT offers annotations like #ConfigurationParameter to inject UIMA parameters into fields. These parameter values can come from descriptors automatically generated by uimaFIT using reflection, or they can come from pre-built XML descriptors.
When using pre-built XML descriptors, it should be possible to employ the External Configuration Parameter Overrides mechanism in conjunction with uimaFIT - but I am not sure if this has already been tried by anybody.
It may even be possible to employ the External Configuration Parameter Overrides mechanism with the descriptors internally generated by uimaFIT.
Disclosure: I am a developer on the UIMA project, focussing on uimaFIT.

Maven plugin processing xml files

What is the preferred way for a maven plugin to process Xml files (external, not the pom) and possibly map them to objects? (e.g. using perhaps the same "technique" like for the configuration via #Parameter)
Where can I find some good examples?
Thank you,
Alex
The way the maven plugins do this, by indirectly using the maven-modello which can be used of course separately without any relationship to maven, cause it's a different view which supports different versions etc. of a model instead in comparsion to things like JAXB etc.
The documentation of the modello model will give a good impression of what is possible.
Other plugins for example the maven-assembly-plugin are using the modello model to parse external xml file.

Resources