Elastic search Index design for logging - elasticsearch

I am new to elastic search and trying to use the Elastic search and Kibana to monitor logs of my application.
I need to monitor three different data queue's parameters(que_in time, que_out time, delay). What should be my index design strategy?
Should I go for a single index and multiple types inside that index for each of three queues or single index with a single type with data of all three queues?
Any other suggestions or pointers, not limited to above case on designing strategy for elastic search index would be highly appreciated.

Related

OpenSearch Indexes and Data streams

I have recently started to use OpenSearch and having a few newbie questions.
What is the difference between Index, Index Pattern and Index template? (Some examples would be really helpful to visualize and differentiate these terminologies).
I have seen some indexes with data streams and some without data streams. What exactly are data streams and why some indexes have them and the others do not.
Tried reading a few docs, watching a few youTube videos. But it's getting a little confusing as I do not have much hands on experience with OpenSearch.
(1)
An index is a collection of JSON documents that you want to make searchable. To maximise your ability to search and analyse documents, you can define how documents and their fields are stored and indexed (i.e., mappings and settings).
An index template is a way to initialize with predefined mappings and settings new indices that match a given name pattern - e.g., any new index with a name starting with "java-" (docs).
An index pattern is a concept associated with Dashboards, the OpenSearch UI. It provides Dashboards with a way to identify which indices you want to analyse, based on their name (again, usually based on prefixes).
(2)
Data streams are managed indices highly optimised for time-series and append-only data, typically, observability data. Under the hood, they work like any other index, but OpenSearch simplifies some management operations (e.g., rollovers) and stores in a more efficient way the continuous stream of data that characterises this scenario.
In general, if you have a continuous stream of data that is append-only and has a timestamp attached (e.g., logs, metrics, traces, ...), then data streams are advertised as the most efficient way to model this data in OpenSearch.

ElasticSearch as primary DB for document library

My task is a full-text search system for a really large amount of documents. Now I have documents as RTF file and their metadata, so all this will be indexed in elastic search. These documents are unchangeable (they can be only deleted) and I don't really expect many new documents per day. So is it a good idea to use elastic as primary DB in this case?
Maybe I'll store the RTF file separately, but I really don't see the point of storing all this data somewhere else.
This question was solved here. So it's a good case for elasticsearch as the primary DB
Elastic is more known as distributed full text search engine , not as database...
If you preserve the document _source it can be used as database since almost any time you decide to apply document changes or mapping changes you need to re-index the documents in the index(known as table in relation world) , there is no possibility to update parts of the elastic lucene inverse index , you need to re-index the whole document ...
Elastic index survival mechanism is one of the best , meaning that if you loose node the index lost replicas are automatically replicated to some of the other available nodes so you dont need to do any manual operations ...
If you do regular backups and having no requirement the data to be 24/7 available it is completely acceptable to hold the data and full text index in elasticsearch as like in database ...
But if you need highly available combination I would recommend keeping the documents in mongoDB (known as best for distributed document store) for example and use elasticsearch only in its original purpose as full text search engine ...

Is there a way to instruct Elasticsearch to only return matches from one node

We are designing a large framework around Elasticsearch and are investigating a few options.
For some complex analysis jobs, we are looking for a way to retrieve data from only the currently connected Elasticsearch node, i.e. only data from the primary shard on the node that I am connected to via the client or no result if there is no primary shard located on this node?
Is this possible via some search attribute or via more specialized setup?
We want to use the normal Elasticsearch functionality as much as possible, naturally, but sometimes there might be queries that need this type of access, is this doable with Elasticsearch?
You can restrict the search to specific shards using the preference query string parameter (see https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-request-preference.html).
e.g. by sending your query to http://ES-NODE:9200/INDEXNAME/_search?preference=_shards:1
you should be able to restrict the query to shard 1

couchbase data replication elasticsearch

I went through Couchbase xcdr replication documentation, but failed to understand below point:
1. couchbase replicate the all the data in bucket in batches to elstic search. And elastic search provide the indexing for these data for realtime statical data. My question is if all the data is replicated to elsastic search , then in this case elastic search is like database which can hold huge amount of data. So can we replace couchbase with elastic search?
2.how the data in form json is send to d3.js for display statical graph.
All of the data is replicated to Elastic Search, but is not held there by default. The indexes and such are created, but the documents are discarded. Elastic Search is not a database and does not perform like one and certainly not on the level of Couchbase. Take a look at this presentation where it talks about performance and stuff and why Cochbas
If your data are not critical or if you have another source of truth, you can use Elasticsearch only.
Otherwise, I'd keep Couchbase and Elasticsearch.
There is a resiliency page on Elastic.co website which describes potential known problems. https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html
My 2 cents.

elastic search index strategies under high traffic

We use ElasticSearch for our tool's real time metrics and analytics part. ElasticSearch is very cool and fast when we are query our data. (statiticial facets and terms facet)
But we have problem when we try to index our hourly data. We collect every our metric data from other services. First we collect data from other services and save them RabbitMQ process. But when queue worker runs our all hourly data not index to ES. Usually %40 of data index in ES and other them lost.
So what is your idea about when index ES under high traffic ?
I've posted answers to other similar questions:
Ways to improve first time indexing in ElasticSearch
Performance issues using Elasticsearch as a time window storage (latter part of my answer applies)
Additionally, instead of a custom 'queue worker' have you considered using a 'river'? For more information see:
http://www.elasticsearch.org/blog/the-river/
http://www.elasticsearch.org/guide/reference/river/

Resources