OpenNLP and database of synonyms - opennlp

We've the database of synonyms of organization names (e.g. BT is British Telecom. We use OpenNLP to extract entities and keywords from text blocks. Is there a way to tell OpenNLP to use our database data (e.g. if it finds BT as Organization Name it should return British Telecom). Some kind of hook. Or we just have to do that manually against OpenNLP results?

In OpenNLP 1.6 there is a new compoent called "EntityLinker"
The purpose of EntityLinker is to solve the exact problem you have... linking NER results to authoritative databases. In the opennlp addons there is an implementation of an EntityLinker that does geocoding by linking NER results to geographic placenames gazateers. OpenNLP 1.6 will be out soon but you could pull trunk.
In the meantime you could take the approach you alluded to, which is to create a class that takes in your NER results, and queries your database, and return N best matches.
Assuming your database supports "fuzzy" search, you could generate a score and return a set of scored results

This[1] should help. According to Apache openNLP documentation, a custom corpus can be used to train a language model.
An alternative is to use Apache Stanbol which integrates openNLP into a coherent high level platform where you can easily configure custom vocabularies for the purpose of name entity recognition [2]
[1] http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training
[2] https://stanbol.apache.org/docs/trunk/customvocabulary.html

Related

NLP and context based search using Elastic search

I have been using ES to handle regular text/keyword search, is there a way to use elastic search to handle context based search i.e when user have given a search text "articles between 10 august and 24 September" and such similar scenarios, ES should be able identify what user is asking and present results. I suppose we are supposed to involve ML to handle such scenarios, If any NLP or ML integrations need to be done where should i start to up the search experience.
Any insight over this is much appreciated
This is called semantic parsing. What you need to do is to map the sentence to a logical form. This is a challenging task, since the computer needs to understand your sentence. You may create your own Semantic Parser(e.g., SEMPRE) to do the translation, or use existing methods to do such translations (translate human language to elastic search queries).

Dbpedia indexing for named entity linking (chatbot)

I'm working on a project for a chatbot. The chatbot must answer users' questions using dbpedia, and was initially trained with the IBM watson assistant service. However, in this service it is necessary to manually fill in the dictionaries in which the dbpedia entities and their synonyms are defined. The entities defined in the dictionaries are those that are recognized in the user's natural language questions.
For example, in the question "Who is the director of spiderman?" the chatbot recognizes dbo:director and Spiderman entity, because they are defined in the dictionary.
Manually inserting all the dbpedia entities in the dictionaries is limiting and for the moment the chatbot recognizes only the few entities included in the dictionary.
I therefore want to recognize dbpedia entities present in natural language questions written by the user by exploiting an indexing of the dbpedia rdf datasets on something like Elasticsearch or Lucene, and then using the full-text search. I thought of indexing entities using only the literal properties of dbpedia (to use full-text search). Before continuing I would like to know if it is a right approach, and have some advice on how to proceed in setting the indexes and on how to effectively exploit the full-text search.
Thank you

Natural Language Processing Using Elasticsearch and Google Cloud Api

I want to use NLP with elasticsearch. I have been able to achieve one level by using Open NLP plugin as mentioned in comments of this question. I am getting entities like person, organization, location etc indexed while inserting documents.
I have a doubt while searching the same information.Since, I need to process the terms entered by the user during query time. Following is what I have thought of:
Process the query entered by user using apache NLP as specified here.
Extract Person, location and organisation Names from the previous and then run a query against the entities stored in index.
I am also thinking of using Google Knowledge Graph Search Api to fetch related information about the extracted entities in the previous steps and then include them in search query as well. (Reason to do this is because we want to show results of Delhi in case some one searches for Capital Of India). We are not going with Synonyms Search approach in this case as we want the information to be dynamically available.
My question is that-
Is there something better we can do to achieve the same, because lot of processing at query time is going to increase the response time?

Working with NLP tags in Elasticsearch

Working on a large data-oriented search product powered by elasticsearch. We've built a lot of machine learning functionality on top of this app, but currently we're having some difficulty deciding how to integrate fairly standard NLP-based word tags into our ES index.
Currently we have a tagging service that can annotate a word with a respective type (or types, but one may be useful enough for now). This function could be abstracted to: type = getWordType(word) I imagine there must be a way to integrate this tagging service into the analysis chain that is applied at index time, where, maybe, we tell the index what type a particular word belongs to. However, doing this kind of advanced analysis is a bit beyond my elasticsearch capacity. Does anyone have pointers on this kind of advanced analysis in elasticsearch?
Thanks!
you might want to take a look at the ingest node functionality introduced in Elasticsearch 5.0. This allows you to preprocess your documents and add fields into the JSON before the document is being indexed in Elasticsearch.
I wrote an ingest processor that is using OpenNLP to enrich documents. You could take a look at that one and adapt it to your needs (also, pull requests are very welcome).
Check it out at https://github.com/spinscale/elasticsearch-ingest-opennlp
This is achieved in Elasticsearch 6.5 with the type annotated_text: https://www.elastic.co/guide/en/elasticsearch/plugins/6.x/mapper-annotated-text-usage.html
Essentially, kind of like synonyms, the tags (or named entity IDs, etc) can exist at the same position as the word you’re tagging.
Needs a plugin installed, the Mapper Annotated Text Plugin.

Keyword search over a collection of OWL ontologies

I have a collection of OWL ontologies. Each ontology is stored in a dataset of a triple store database (e.g OWLIM, Stardog, AllegroGraph ). Now I need to develop an application which supposes searching these ontologies based on keywords, i.e., given a keyword, the application should return ontologies that contains this keyword.
I have checked OWLIM-SE and Stardag, they only provide full text search over one dataset but not the whole database. I also have considered Solr(Lucene). But in this case the ontologies will be indexed twice (once by Lucene, another one by triple store database.)
Is there any other solution for this problem?
Thanks in advance.
Stardog's full text indexing works over an entire database and can be done transparently with SPARQL which will allow you to easily access other properties of the concepts matching your search criteria in a single query. This will get you precisely what you're describing.
For some information on administering the search indexes, and Stardog in general, check out these docs

Resources