Elastic Search server searching remain continious while indexing the data into it - elasticsearch

How to serve search or query operation uninterrupted from a elastic server while we index/re-index the data into that particular elastic search server at the same time.

You need to look at using aliases:
https://www.elastic.co/guide/en/elasticsearch/guide/current/index-aliases.html
Essentially you create new index with a unique name, and swap the alias out with the old index, in one atomic step

Related

ElasticSearch as primary DB for document library

My task is a full-text search system for a really large amount of documents. Now I have documents as RTF file and their metadata, so all this will be indexed in elastic search. These documents are unchangeable (they can be only deleted) and I don't really expect many new documents per day. So is it a good idea to use elastic as primary DB in this case?
Maybe I'll store the RTF file separately, but I really don't see the point of storing all this data somewhere else.
This question was solved here. So it's a good case for elasticsearch as the primary DB
Elastic is more known as distributed full text search engine , not as database...
If you preserve the document _source it can be used as database since almost any time you decide to apply document changes or mapping changes you need to re-index the documents in the index(known as table in relation world) , there is no possibility to update parts of the elastic lucene inverse index , you need to re-index the whole document ...
Elastic index survival mechanism is one of the best , meaning that if you loose node the index lost replicas are automatically replicated to some of the other available nodes so you dont need to do any manual operations ...
If you do regular backups and having no requirement the data to be 24/7 available it is completely acceptable to hold the data and full text index in elasticsearch as like in database ...
But if you need highly available combination I would recommend keeping the documents in mongoDB (known as best for distributed document store) for example and use elasticsearch only in its original purpose as full text search engine ...

Elasticsearch data comparison

I have two different Elasticsearch clusters,
One cluster is Elastcisearch 6.x with the data, Second new Elasticsearch cluster 7.7.1 with pre-created indexes.
I reindexed data from Elastcisearch 6.x to Elastcisearch 7.7.1
Is there any way to get the doc from source and compare it with the target doc, in order to check that data is there and it is not affected somehow.
When you perform a reindex the data will be indexed based on destination index mapping, so if your mapping is same you should get the same result in search, the _source value will be unique on both indices but it doesn't mean your search result will be the same. If you really want to be sure everything is OK you should check the inverted index generated by both indices and compare them for fulltext search, this data can be really big and there is not an easy way to retrieve it, you can check this for getting term-document matrix .

Querying the elastic search before inserting into it

i am using spring boot application to load messages into elastic search. i have a use case where i need to query the elastic search data , get some id value and populate it in elastic search json document before inserting it into elastic search
Querying the elastic search before insertion . Will this be expensive ? If yes is there some other way to approach this issue.
You can use update_by_query to do it in one step.
But otherwise it shouldn't be slow if you do it in two steps (get + update). It depends on many things - how often you do it, how much data is transferred, etc.

Is it possible for an Elasticsearch index to have a primary key comprised of multiple fields?

I have a multi-tenant system, whereby each tenant gets their own Mongo database within a MongoDB deployment.
However for elastic search indexing, this all goes into one elastic instance via Mongoosastic, tagged with a TenantDB to keep data separated when searching.
Currently we have some of the same _id's reused across the multiple databases in test data for various config collections(Different document content, same _id), however this is causing a problem when syncing to elastic as although they're in separate databases when they come into elastic with the same Type and ID one of them gets dropped.
Is it possible to specify both the ID and TenantDB as the primary key?
Solution 1: You can search for multiple index in Elasticsearch. But, If you can not separate your index for database, you can follow like below method. While syncing your data to elasticsearch, use a pattern to create elastic document _id. For example, from mongoDb1 use mdb1_{mongo_id}, from mongoDb2 use mdb2_{mongo_id} , etc. This will be unique your _ids if you have not same id in same mongo database.
Solution 2: Separate your index.

Elastic search Index design for logging

I am new to elastic search and trying to use the Elastic search and Kibana to monitor logs of my application.
I need to monitor three different data queue's parameters(que_in time, que_out time, delay). What should be my index design strategy?
Should I go for a single index and multiple types inside that index for each of three queues or single index with a single type with data of all three queues?
Any other suggestions or pointers, not limited to above case on designing strategy for elastic search index would be highly appreciated.

Resources