Elasticsearch find search terms matching text - elasticsearch

I have a scenario where i need to map each article to an entity. To do so, we are maintaining a set of keywords / search phrase (ex: (icici OR hdfc) AND bank) that may be available in each article. We want to use the power of elastic search to scan all the search phrases that may be available in the article being processed.
What i have come across yet is forward search (like full text search and so on) But now here what i need is to have a reverse search of search phrases against an article.
I was digging for a solution and hopped some genius would have already discovered the same and would help in for the same.

In Elasticsearch it's called percolator.

Related

Elastic Enterprise Search - Is it a best practice to index data of two different json schema in a single index

Hi I'm trying out Elastic Enterprise Search with Elasticsearch. I have a couple of questions on data indexing.
When referring to Elasticsearch documentation, I read that there is a limit to the number of fields that an Elasticsearch index could have. Since Elasticsearch is used with Elastic Enterprise Search I believe there is no arguing that the same applies here. In that case lets say I have multiple document types with various fields. For an example Person.json and Dog.json, they both have different properties. So when indexing I use one search engine in Elastic Enterprise Search to index both Person and Dog so that when I query using the Elastic Enterprise Search API I'll get results which are both Person and Dog depending on the search term.
Is this the way to go,or should I specify a seperate search engine for each schema type?
I am assuming that your person.json and dog.json contains different fields as your heading suggest and weather to create a separate index for these entities or have them in a single index, depends on the various use-cases you have in your application and you will not find elasticsearch marking one approach better than other and mainly will explain the pros/cons based on a particular context(like relevance, performance, management etc).
Please refer to my this SO answer, where I talked about various pros/cons of both the approach and discussion in chat to get more context why OP chose an approach based on his use-case, after knowing the pros/cons.

Does Couchbase 5 makes ElasticSearch useless for Full Text Search?

Couchbase FTS is now an official feature in version 5. Why would one still use ElasticSearch along with Couchbase?
Quoting from the documentation:
Couchbase FTS is similar in purpose to other search software such as
ElasticSearch or Solr. Couchbase FTS is not intended as a replacement
for third party search software if search is at the core of your
application. It is a simple and lightweight way to add search to your
Couchbase data without deploying additional software and servers. If
you have many queries which look like SELECT ... field1 LIKE %pattern% OR field2 LIKE %pattern, then full-text search may be right for you.
It will depend on your specific use case, but there is a reason why search is a complicated problem and some products spent years and years on working on that (and continue).
Full text search NOT EQUAL Search engine. Full Text Search does support a lot of functions that ElasticSearch provides. For example in ElasticSearch you can set weight of fields in result set, do geo search etc. Couchbase full text search is just full text search implementation, i.e. basic string matching function in specially indexed field only.
So, if your task is to do basic search on sub string as a part of a query, then you don't need ElasticSearch anymore. It make development quicker and infrastructure cheaper. However, if you are building system that need proper search engine, then you need ElasticSearch as much as before.

Elasticsearch - Autocomplete return word/term/token suggestions instead of whole documents

I am trying to implement a simple auto completion for query terms.
There are many different approaches but most of them do return documents instead of terms
- or the authors simply stopped explaining from that point and i am not able to adapt.
A user is typing in a query - e.g. phil
What i want is to provide a list of term completion suggestions like philipp, philius, philadelphia, ...
I am able to get document matches via (edge)ngrams, phrase_prefix and so on but i am am stuck at retrieving matching terms (completion suggestions).
Can someone give me a hint?
I have documents like this {"title":"...", "description":"...", "content":"..."}
All fields have larger string values but especially the field content contains fulltext content.
I do not want to suggest the whole title of a document containing e.g. Philadelphia. Just the word "Philadelphia".
Looking for something like that, myself.
In SOLR it was relatively simple to configure (although a pain to build and keep up-to-date) using solr.SpellCheckComponent. Somehow the same underlying Lucene functionality is used differently between SOLR and ElasticSearch, and in ElasticSearch it is geared towards finding whole documents (or whole field values, if you will) or so it seems...
Despite the profusion of "elasticsearch autocomplete" articles, none appears to deal with this particular issue. Like it doesn't exist. Maybe their use case is different and ElasticSearch works for them just fine, who knows?
At this point I think that preparing the exact field values to use with ElasticSearch autocomplete (yes, that's the input field values, not analyzer tokens) maybe the only way to solve the problem. Which is terrible, because the performance is going to be very low.
Try term suggester:
The term suggester suggests terms based on edit distance. The provided
suggest text is analyzed before terms are suggested. The suggested
terms are provided per analyzed suggest text token. The term suggester
doesn’t take the query into account that is part of request.

Can ElasticSearch create/store just the indexes while leaving the source document where it is?

Assuming I already have a set of documents living in some document store can I have ElasticSearch create its indexes and store them in its various replicated nodes while leaving the documents themselves where they are? In other words can I use ES just for search and not for storage? (I understand this might not be ideal but assume there are good reasons I need to keep the documents themselves where they are).
If I take this approach does it remove any functionality from search, for example showing where in a document the search term was found?
Thanks.
The link Konstantin referenced should show you how to disable _source.
There is another way to store fields (store=true). You are better off using _source and excluding any specific fields you don't want stored as part of _source, though.
Functionality removed:
Viewing fields that are returned from search
Highlighting
Easily rebuilding an index from _source. Probably not an issue, since data is stored elsewhere
There are probably other features I am missing.
The only case I've come across where I really don't need _source is when building an analytics engine where I am only returning aggregates (term and histogram).

retaining case in elasticsearch faceted search

Is there a way to do faceted searches using the elasticsearch Search API maintaining case (as opposed to having the results be converted to lowercase).
Thanks in advance, Chuck
Assuming you are using the "terms" facet, the facet entries are exactly the terms in the index. Briefly, analysis is the process of converting a field value into a sequence of terms, and lowercasing is a step in the default analyzer; that's why you're seeing lowercased terms. So you will want to change your analysis configuration (and perhaps introduce a multi_field if you want to run several different analyzers.)
There's a great explanation in Lucene in Action (2nd Ed.); it's applicable to ElasticSearch, too.

Resources