How to get current using shards in Elasticsearch or Opensearch - elasticsearch

My opensearch sometimes reaches this error when i adding new index:
Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;
So i have to increase cluster.max_shards_per_node larger.
I wonder if is there any way to check current shards we are using to avoid this error happening?

The best way to see indexing and search activity is by using a monitoring system. And the best monitoring system for Elasticsearch is Opster. You can try it for free at the following link.
https://opster.com/
For the manual check and sort, you can try the following APIs.
You can sort your indices according to the creation date string (cds). It will help you to understand which one is the old one. So you can have an idea about your indices (shards).
GET _cat/indices?v&h=index,cds&s=cds
Also, you check the indices stats to see if is there any activity in searching or indexing.
To check all indices you can use GET _all/_stats
To check only one index you can use GET index_name/_stats

Related

Elasticsearch count of searches against an index resets to zero after cluster restart

We use Elasticsearch - one cluster is 7.16 and another is 8.4. Behavior is the same in both.
We need to be able to get a count of search queries run against an index since the index's creation.
We retrieve the amount of searches that have been run against a given index by using the _stats endpoint as such:
GET /_stats?filter_path=indices.my_index.primaries.search.query_total
The problem is that this stat resets to zero after a cluster reboot. Does this data persist anywhere for a given index such that I can get the total since inception of the index? If not, is there an action I can take to somehow record that stat before a reboot so I can always access the full total number?
EDIT - this is the only item I was able to find on this subject, and the answer in this discussion does not look promising: https://discuss.elastic.co/t/why-close-reopen-index-will-reset-index-stats-to-zero/170830
As far as I know, there is no Out of the box solution to achieve your use-case, but its not that hard to build it yourself either, You can simply call the same _stats API periodically and store it in some other index of Elasticsearch or DB so that its not reset. IMHO Its not that big work.

Elasticsearch _cluster/stats api vs _stats API

I am a bit confused on why I would be getting a relatively drastic different amounts for each of this api calls for doc count (Primary and Total), shard amounts, and even the amount of indices.
Does anyone know why these two API calls would return different statistics for a given elasticsearch cluster?
Thanks
GET _stats will show details only about the index which you have created, and it's not included statistics about hidden index or system index which start with dot(.).
GET _cluster/stats will show details for all the indices available in your cluster and it will include hidden / system index as well.

Elasticsearch check if an index is receiving read traffic

We are using elasticsearch 6.0.
We have an index in our cluster and would like to know if there are any clients who are querying it.
Is there a way to know if an index is receiving read( get, search, aggregation etc) on an index?
If you don't have monitoring enabled on your cluster please have a look on the index stats api. You'll find a lot of metrics worth to monitor.
You can see which thread has completed how many write tasks completed with this command:
GET /_cat/thread_pool/write?v&h=id,node_name,active,rejected,completed
or for getting get task:
GET /_cat/thread_pool/get?v&h=id,node_name,active,rejected,completed

How to debug document not available for search in Elasticsearch

I am trying to search and fetch the documents from Elasticsearch but in some cases, I am not getting the updated documents. By updated I mean, we update the documents periodically in Elasticsearch. The documents in ElasticSearch are updated at an interval of 30 seconds, and the number of documents could range from 10-100 Thousand. I am aware that the update is generally a slow process in Elasticsearch.
I am suspecting it is happening because Elasticsearch though accepted the documents but the documents were not available for searching. Hence I have the following questions:
Is there a way to measure the time between indexing and the documents being available for search? There is setting in Elasticsearch which can log more information in Elasticsearch logs?
Is there a setting in Elasticsearch which enables logging whenever the merge operation happens?
Any other suggestion to help in optimizing the performance?
Thanks in advance for your help.
By default the refresh_interval parameter is set to 1 second, so unless you changed this parameter each update will be searchable after maximum 1 second.
If you want to make the results searchable as soon as you have performed the update operation you can use the refresh parameter.
Using refresh=wait_for the endpoint will respond once a refresh has occured. If you use refresh=true a refresh operation will be triggered. Be careful using refresh=true if you have many update since it can impact performances.

How does elastic search brings back a node which is down

I was going through elastic search and wanted to get consistent response from ES clusters.
I read Elasticsearch read and write consistency
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-index_.html
and some other posts and can conclude that ES returns success to write operation after completing writes to all shards (Primary + replica), irrespective of consistency param.
Let me know if my understanding is wrong.
I am wondering if anyone knows, how does elastic search add a node/shard back into a cluster which was down transiently. Will it start serving read requests immediately after it is available or does it ensures it has up to date data before serving read requests?
I looked for the answer to above question, but could not find any.
Thanks
Gopal
If node is removed from the cluster and it joins again, Elasticsearch checks if the data is up to date. If it is not, then it will not be made available for search, until it is brought up to date again (which could mean the whole shard gets copied again).
the consistency parameter is just an additional pre-index check if the number of expected shards are available in the cluster (if the index is configured to have 4 replicas, then the primary shard plus two replicas need to be available, if set to quorum). However this parameter does never change the behaviour that a write needs to be written to all available shards, before returning to the client.

Resources