Full-text search against string (databaseless) - full-text-search

Is there a way to perform a search in a document which I don't want to be stored anywhere? I've got some experience with Sphinx search and ElasticSearch and it seems they both operate on a database of some kind. I want to search a word in a single piece of text, in a string variable.

I ended up using nltk and pymorphy just tokenizing my text and comparing stems/normalized morphological forms from pymorphy with search items. No need for any heavy full-text search weaponry.

Related

How can I remove one delimiter from elasticsearch tokenizer?

I am using elasticsearch 6.8 for text searching. And I realised that elasticsearch tokenizer breaks text into words by using delimiters listed here: http://unicode.org/reports/tr29/#Default_Word_Boundaries. I am using match_phase to search one of the fields in my document and I'd like to remove one delimiter used by tokenizer.
I did some search and found some solutions like, using keyword rather than text. This solution will have a big impact on my search function because it doesn't support partial query.
Another solution is to use keyword query but use wildcard to support partial query. But this may impact performance on the query. And also, I still like using tokenizer for other delimiters.
A third options is to use tokenize_on_chars to define all characters used to tokenize text. But this requires me to list all other delimiters. So I am looking for something like tokenize_except_chars.
So is there a easy way for me to take one character out from delimiters tokenizer is using in elasticsearch6.8?
I found elasticsearch supports protected_words which can do the job. More info can be found in https://www.elastic.co/guide/en/elasticsearch/reference/6.8/analysis-word-delimiter-tokenfilter.html

Automatic translation of search queries using Lucene or similar technologies

I am evaluating search technologies and one of my requirements is the ability to hit translated text also.
For example, there are text documents written in English and French. And lucene will index them.
If I am searching for the string "apple", it should search for both "apple" and "pomme" and show documents with either.
Will any technologies provide automatic translation of token words ?
Or only way to do that is to translate it using Google API and then feed it to lucene for indexing?
There are no automatic translations in Lucene/Solr/Elasticsearch, but they have a similar feature, called Synonyms. You can create a list of synonyms with Google Api to translate the terms in the search time, not the index time.
With this approach, you can search for "apple" and the search engine will see "apple" and "pomme" as synonyms, and you will get the result as expected.

Does Couchbase 5 makes ElasticSearch useless for Full Text Search?

Couchbase FTS is now an official feature in version 5. Why would one still use ElasticSearch along with Couchbase?
Quoting from the documentation:
Couchbase FTS is similar in purpose to other search software such as
ElasticSearch or Solr. Couchbase FTS is not intended as a replacement
for third party search software if search is at the core of your
application. It is a simple and lightweight way to add search to your
Couchbase data without deploying additional software and servers. If
you have many queries which look like SELECT ... field1 LIKE %pattern% OR field2 LIKE %pattern, then full-text search may be right for you.
It will depend on your specific use case, but there is a reason why search is a complicated problem and some products spent years and years on working on that (and continue).
Full text search NOT EQUAL Search engine. Full Text Search does support a lot of functions that ElasticSearch provides. For example in ElasticSearch you can set weight of fields in result set, do geo search etc. Couchbase full text search is just full text search implementation, i.e. basic string matching function in specially indexed field only.
So, if your task is to do basic search on sub string as a part of a query, then you don't need ElasticSearch anymore. It make development quicker and infrastructure cheaper. However, if you are building system that need proper search engine, then you need ElasticSearch as much as before.

Elasticsearch find search terms matching text

I have a scenario where i need to map each article to an entity. To do so, we are maintaining a set of keywords / search phrase (ex: (icici OR hdfc) AND bank) that may be available in each article. We want to use the power of elastic search to scan all the search phrases that may be available in the article being processed.
What i have come across yet is forward search (like full text search and so on) But now here what i need is to have a reverse search of search phrases against an article.
I was digging for a solution and hopped some genius would have already discovered the same and would help in for the same.
In Elasticsearch it's called percolator.

retaining case in elasticsearch faceted search

Is there a way to do faceted searches using the elasticsearch Search API maintaining case (as opposed to having the results be converted to lowercase).
Thanks in advance, Chuck
Assuming you are using the "terms" facet, the facet entries are exactly the terms in the index. Briefly, analysis is the process of converting a field value into a sequence of terms, and lowercasing is a step in the default analyzer; that's why you're seeing lowercased terms. So you will want to change your analysis configuration (and perhaps introduce a multi_field if you want to run several different analyzers.)
There's a great explanation in Lucene in Action (2nd Ed.); it's applicable to ElasticSearch, too.

Resources