Elasticsearch _cluster/stats api vs _stats API - elasticsearch

I am a bit confused on why I would be getting a relatively drastic different amounts for each of this api calls for doc count (Primary and Total), shard amounts, and even the amount of indices.
Does anyone know why these two API calls would return different statistics for a given elasticsearch cluster?
Thanks

GET _stats will show details only about the index which you have created, and it's not included statistics about hidden index or system index which start with dot(.).
GET _cluster/stats will show details for all the indices available in your cluster and it will include hidden / system index as well.

Related

How to get current using shards in Elasticsearch or Opensearch

My opensearch sometimes reaches this error when i adding new index:
Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;
So i have to increase cluster.max_shards_per_node larger.
I wonder if is there any way to check current shards we are using to avoid this error happening?
The best way to see indexing and search activity is by using a monitoring system. And the best monitoring system for Elasticsearch is Opster. You can try it for free at the following link.
https://opster.com/
For the manual check and sort, you can try the following APIs.
You can sort your indices according to the creation date string (cds). It will help you to understand which one is the old one. So you can have an idea about your indices (shards).
GET _cat/indices?v&h=index,cds&s=cds
Also, you check the indices stats to see if is there any activity in searching or indexing.
To check all indices you can use GET _all/_stats
To check only one index you can use GET index_name/_stats

Elasticsearch count of searches against an index resets to zero after cluster restart

We use Elasticsearch - one cluster is 7.16 and another is 8.4. Behavior is the same in both.
We need to be able to get a count of search queries run against an index since the index's creation.
We retrieve the amount of searches that have been run against a given index by using the _stats endpoint as such:
GET /_stats?filter_path=indices.my_index.primaries.search.query_total
The problem is that this stat resets to zero after a cluster reboot. Does this data persist anywhere for a given index such that I can get the total since inception of the index? If not, is there an action I can take to somehow record that stat before a reboot so I can always access the full total number?
EDIT - this is the only item I was able to find on this subject, and the answer in this discussion does not look promising: https://discuss.elastic.co/t/why-close-reopen-index-will-reset-index-stats-to-zero/170830
As far as I know, there is no Out of the box solution to achieve your use-case, but its not that hard to build it yourself either, You can simply call the same _stats API periodically and store it in some other index of Elasticsearch or DB so that its not reset. IMHO Its not that big work.

Setting up a daily partitioned index

I'm looking to setup my index such that it is partitioned into daily sub-indices that I can adjust the individual settings of depending on the age of that index, i.e. >= 30 days old should be moved to slower hardware etc. I am aware I can do this with a lifecycle policy.
What I'm unable to join-the-dots on is how to setup the original index to be partitioned by day. When adding data/querying, do I need to specify the individual daily indicies or is there something in Elasticsearch that will do this for me? If the later, how does it work with adding/querying (assuming they are different?)...how does it determine the partitions that are relevant for the query/partition to add a document to? (I'm assuming there is a timestamp field - but I can't see from the docs how its all linked together)
I'm using the base Elasticsearch OSS v7.7.1 without any plugins installed.
there's no such thing as sub indices or partitions in Elasticsearch. if you want to use ilm, which you should, then you are using aliases and multiple indices
you will need to upgrade from 7.7 - which is EOL - and use the default distribution to get access to ilm as well
getting back to your conceptual questions, https://www.elastic.co/guide/en/elasticsearch/reference/current/overview-index-lifecycle-management.html and the following few chapters dive into it. but to your questions;
the major assumption of using ilm is that data being ingested is current, so on a rough level, data from today will end up in an index from today
if you are indexing historic data then you may want to put that into "traditional" index names, eg logs-2021.08.09 and then attach them to the ilm policy as per https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-with-existing-indices.html
when querying, Elasticsearch will handle accessing all the indices it needs based on the request it receives. it does this via https://www.elastic.co/guide/en/elasticsearch/reference/current/search-field-caps.html

Is is more efficient to query multiple ElasticSearch indices at once or one big index

I have an ElasticSearch cluster and my system handles events coming from an API.
Each event is a document stored in an index and I create a new index per source (the company calling the API). Sources come and go, so I have new sources every week and most sources become inactive after a few weeks. Each source send between 100k and 10M new events every day.
Right now my indices are named api-events-sourcename
The documents contain a datetime field and most of my queries look like "fetch the data for that source between those dates.
I frequently use Kibana and I have configured a filter that matches all my indices (api-events-*) at once, and I then add terms to filter a specific source and specific days.
My requests can be slow at times and they tend to slow down the ingestion of new data.
Given that workflow, should I see any performance benefits to create an index per source and per day, instead of the index per source only that I use today ?
Are there other easy tricks to avoid putting to much strain on the cluster ?
Thanks!

Elasticsearch indexes but does not store documents

I'm having troubles storing documents within a 3-node Elasticsearch cluster that previously was able to store documents. I use the Java API to send bulks of documents to Elasticsearch, which are accepted (no failure in BulkResponse object) AND Elasticsearch has heavy index activities. However, the number of documents are not increased and I assume that none of them are store.
I've looked into Elasticsearch logs (of all three nodes) but I see no errors or warnings.
Note: I've had to restart two nodes previously but search/query is working perfectly. (the count in the image starts at ~17:00 as I've installed the Marvel plugin at this time)
What can I do to solve or debug the problem?
Sorry for this point blank code blindness by me! I forgot to skip the cursor when reading from MongoDB and therefore re-inserted the same 1000 documents into Elasticsearch for thousands of times!
Learning: If this problem occurs check if you select the correct documents in your database and that these documents are not already stored in ES.
Sidenote to Marvel: It would be great is this could be indicated in any way - e.g. by having a chart with "updated documents" (I rechecked and could not find one)

Resources