Does Couchbase 5 makes ElasticSearch useless for Full Text Search?

Does Couchbase 5 makes ElasticSearch useless for Full Text Search? - elasticsearch

Couchbase FTS is now an official feature in version 5. Why would one still use ElasticSearch along with Couchbase?

Quoting from the documentation:
Couchbase FTS is similar in purpose to other search software such as
ElasticSearch or Solr. Couchbase FTS is not intended as a replacement
for third party search software if search is at the core of your
application. It is a simple and lightweight way to add search to your
Couchbase data without deploying additional software and servers. If
you have many queries which look like SELECT ... field1 LIKE %pattern% OR field2 LIKE %pattern, then full-text search may be right for you.
It will depend on your specific use case, but there is a reason why search is a complicated problem and some products spent years and years on working on that (and continue).

Full text search NOT EQUAL Search engine. Full Text Search does support a lot of functions that ElasticSearch provides. For example in ElasticSearch you can set weight of fields in result set, do geo search etc. Couchbase full text search is just full text search implementation, i.e. basic string matching function in specially indexed field only.
So, if your task is to do basic search on sub string as a part of a query, then you don't need ElasticSearch anymore. It make development quicker and infrastructure cheaper. However, if you are building system that need proper search engine, then you need ElasticSearch as much as before.

Related

Elastic Enterprise Search - Is it a best practice to index data of two different json schema in a single index

Hi I'm trying out Elastic Enterprise Search with Elasticsearch. I have a couple of questions on data indexing.
When referring to Elasticsearch documentation, I read that there is a limit to the number of fields that an Elasticsearch index could have. Since Elasticsearch is used with Elastic Enterprise Search I believe there is no arguing that the same applies here. In that case lets say I have multiple document types with various fields. For an example Person.json and Dog.json, they both have different properties. So when indexing I use one search engine in Elastic Enterprise Search to index both Person and Dog so that when I query using the Elastic Enterprise Search API I'll get results which are both Person and Dog depending on the search term.
Is this the way to go,or should I specify a seperate search engine for each schema type?

I am assuming that your person.json and dog.json contains different fields as your heading suggest and weather to create a separate index for these entities or have them in a single index, depends on the various use-cases you have in your application and you will not find elasticsearch marking one approach better than other and mainly will explain the pros/cons based on a particular context(like relevance, performance, management etc).
Please refer to my this SO answer, where I talked about various pros/cons of both the approach and discussion in chat to get more context why OP chose an approach based on his use-case, after knowing the pros/cons.

How search feature works in AEM

 From AEM documents I can figure it out how to write queries for Aem content search, but How search feature works in AEM? Which bundle or framework does the magic of searching the content and present back. How internally content is being traversed when I use search queries ?

AEM uses OAK indexes to implement the search engine. AEM repository is a database and like every other database, it needs indexes to perform speedy searches. You can read more on: https://docs.adobe.com/docs/en/aem/6-2/deploy/platform/queries-and-indexing.html
In general, you define indexes (in case OOTB indexes are not enough) under /oak:indexes node. These indexes, in a broad sense, contain list of properties and nature (async, full text, property, lexical rules) of index and the path to be indexed (or excluded from index).
AEM generates a lot of lucene index data in your repository and data store and that is used to quickly lookup the nodes for your queries. Whenever, a query is fired the AEM instance loops through the indexes and finds the index which will provide the results with least traversal cost. If no such index is found it will resort to node traversal which is normally bad for performance but has some limited edge case uses.
You can integrated Solr and ElasticSearch with your AEM instance to use other advanced features but that is simply an extension to the built-in engine.
Search and promote (which is more of an external search) is not related to internal index and is more like a site crawler.
Queries and searches is a very broad topic so I suggest you read this reply as a summary and more details can be found from the link above.

I agree with the previous answer from Imran.
Question is very general and if you are interested in more details like how does Apache Lucene works in AEM, what options exist for integration with external search engines and how to do it, it is available here:
GitHub repository and six write ups - step by step how to use search engines in AEM.

Working with NLP tags in Elasticsearch

Working on a large data-oriented search product powered by elasticsearch. We've built a lot of machine learning functionality on top of this app, but currently we're having some difficulty deciding how to integrate fairly standard NLP-based word tags into our ES index.
Currently we have a tagging service that can annotate a word with a respective type (or types, but one may be useful enough for now). This function could be abstracted to: type = getWordType(word) I imagine there must be a way to integrate this tagging service into the analysis chain that is applied at index time, where, maybe, we tell the index what type a particular word belongs to. However, doing this kind of advanced analysis is a bit beyond my elasticsearch capacity. Does anyone have pointers on this kind of advanced analysis in elasticsearch?
Thanks!

you might want to take a look at the ingest node functionality introduced in Elasticsearch 5.0. This allows you to preprocess your documents and add fields into the JSON before the document is being indexed in Elasticsearch.
I wrote an ingest processor that is using OpenNLP to enrich documents. You could take a look at that one and adapt it to your needs (also, pull requests are very welcome).
Check it out at https://github.com/spinscale/elasticsearch-ingest-opennlp

This is achieved in Elasticsearch 6.5 with the type annotated_text: https://www.elastic.co/guide/en/elasticsearch/plugins/6.x/mapper-annotated-text-usage.html
Essentially, kind of like synonyms, the tags (or named entity IDs, etc) can exist at the same position as the word you’re tagging.
Needs a plugin installed, the Mapper Annotated Text Plugin.

Comparison of Handling Logs and PDFs in Solr & Elasticsearch and Data Visualization in Banana & Kibana

How do Elasticsearch and Solr compare in respect to the following:
Indexing logs.
Indexing events.
Indexing PDF documents.
Ease of creating and distributing visualizations. Kibana vs Banana.
Support and documentation for developers.
Any help is appreciated.
EDIT
More specifically, i am trying to figure out how exactly a PDF document or an event can be indexed at all. I have worked a little bit on Elasticsearch and since i am a fan of JSON, i found it quite useful when i tried to index structured data.
For example logs are mostly structured and thus i guess easier to index and search. Now what if i want to index the whole log file itself?
Follow up
Is Kibana the only visualization tool available for Elasticsearch?
Is Banana the only visualization tool available for Solr?

Here is an answer to try to address just the Elasticsearch aspect of the post.
Take a look at https://github.com/elastic/elasticsearch-mapper-attachments for handling PDFs
For events/logs, you would need to transform those into structured data to index in Elasticsearch. You can have a field in there for the source (the log file the data came from and other information like that) - you will have all the data in the whole log file indexed in that fashion. You can take advantage of ES aggregations to group results based on log file, calculate statistics, etc.
The ELK stack is definitely worth a look.
I don't know if Kibana is the only visualization tool but it is probably the most popular and likely to offer more than something else.

How to search for multiple strings in very large database

I want to search for multiple strings in a very large database. These strings are part of different attributes of database table. I have tried string search using LIKE in sql query. But it is taking a lot of time to get results. I have used Oracle database.
Should I use indexing of database? I found that Lucene can be used for it.
I also got some suggestions of using big data concepts. Which approach should I use?

The easiest way is:
1.) adding an index to the columns you like to search trough
2.) using oracle text as #lalitKumarB wrote
The most powerful way is:
3.) use an separate search engine (solr, elaticsearch).
But, probably you have to change you application in order to explicit use the search index for searching trough the data,...
I had the same situation some years before. Trying to search text in an big database. After a wile I found out, that database based search will never reach the performance of an dedicate search engine. And: you will have much more search features working out of the box, if you use solr (for example), like spelling correction, "More like this", ...
One option is to hold the data on orcale, searching in solr and return the ID of the document in order to only load the one row form oracle, the is referenced by the ID.
2nd option is to keep oracle as base datapool for your search engine and search in solr (or elasticsearch) in order to return the whole document/row from solr, not only the ID. So you don't need to load the data from the database any more.
The best option depends on your needs.

You have the choice between elasticsearch, solr or lucene

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Does Couchbase 5 makes ElasticSearch useless for Full Text Search? - elasticsearch

Couchbase FTS is now an official feature in version 5. Why would one still use ElasticSearch along with Couchbase?

Related

Elastic Enterprise Search - Is it a best practice to index data of two different json schema in a single index

How search feature works in AEM

Working with NLP tags in Elasticsearch

Comparison of Handling Logs and PDFs in Solr & Elasticsearch and Data Visualization in Banana & Kibana

How to search for multiple strings in very large database

Categories

Resources