Elastic search shards - total, successful, failed

Elastic search shards - total, successful, failed - elasticsearch

This is the results
{
"_index": "vehicles",
"_id": "123",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}
for query
PUT /vehicles/_doc/123
{
"make": "Honda",
"color": "Blue",
"HP": 250,
"milage": 24000,
"price": 19300.97
}
on elastic search 8.
May I know
The total shards (which is 2) does it include primary shard + replica shard?
The successful shards - I supposed that's the primary shard where the put is written into - can it be more than 1?
The failed - I supposed it's the failed primary shard?

As explained in the official documentation for the Index API response body:
_shards.total tells you how many shard copies (primaries + replicas) the index operation should be executed on
_shards.successful returns the number of shard copies the index operation succeeded on. Upon success, successful is at least 1, like in your case. Since by default, write operations only wait for the primary shards to be active before proceeding, only 1 is returned. If you want to see 2, then you need to add wait_for_active_shards=all in your indexing request
_shards.failed contains replication-related errors in the case an index operation failed on a replica shard. 0 indicates there were no failures.

Related

Elasticsearch showing yellow health for logstash index

The health column is showing yellow for logstash index , even after deleting old ones they re recreated with yellow health. I have clusters for this setup and have checked shards using below.
GET _cluster/health :
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 12,
"active_shards" : 22,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 3,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 88.46153846153845
}
Any idea how this can be turned to green ?
Also the index are not getting created daily due to this issue.

The yellow health indicates that your primary shard is allocated but the replicas are not allocated. This may be because your elastic is deployed using one node only. Elastic does not allocate the primary and the replica shards on the same node as it will serve no purpose. When you have multiple nodes and multiples shards, the elastic by default allocates the primary and the replicas to different nodes.
As seen from the data you provided, you have 22 active shards and only 2 nodes. The unassigned shards, i.e., 3, is the problem leading to yellow cluster health.
In order to solve this, you can do 2 things.
If you are using elastic for testing, you can initiate the server with one shard (no replicas). In this case you have one node in your elastic service.
If you are in a production and want multiple shards (primary + replicas), then the number of nodes should be equal to the total number of shards. For instance, if you have 1 primary and 2 replicas, then initiate the server with 3 nodes.
Please remember to do this when you are initiating your elastic server.
The harm in yellow health is that if your primary shard goes bad, you will lose the service and the data as well.

Find sequences in time series data using Elasticsearch

I'm trying to find example Elasticsearch queries for returning sequences of events in a time series. My dataset is rainfall values at 10-minute intervals, and I want to find all storm events. A storm event would be considered continuous rainfall for more than 12 hours. This would equate to 72 consecutive records with a rainfall value greater than zero. I could do this in code, but to do so I'd have to page through thousands of records so I'm hoping for a query-based solution. A sample document is below.
I'm working in a University research group, so any solutions that involve premium tier licences are probably out due to budget.
Thanks!
{
"_index": "rabt-rainfall-2021.03.11",
"_type": "_doc",
"_id": "fS0EIngBfhLe-LSTQn4-",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2021-03-11T16:00:07.637Z",
"current-rain-total": 8.13,
"rain-duration-in-mins": 10,
"last-recorded-time": "2021-03-11 15:54:59",
"rain-last-10-mins": 0,
"type": "rainfall",
"rain-rate-average": 0,
"#version": "1"
},
"fields": {
"#timestamp": [
"2021-03-11T16:00:07.637Z"
]
},
"sort": [
1615478407637
]
}
Update 1
Thanks to #Val my current query is
GET /rabt-rainfall-*/_eql/search
{
"timestamp_field": "#timestamp",
"event_category_field": "type",
"size": 100,
"query": """
sequence
[ rainfall where "rain-last-10-mins" > 0 ]
[ rainfall where "rain-last-10-mins" > 0 ]
until [ rainfall where "rain-last-10-mins" == 0 ]
"""
}
Having a sequence query with only one rule causes a syntax error, hence the duplicate. The query as it is runs but doesn't return any documents.
Update 2
Results weren't being returned due to me not escaping the property names correctly. However, due to the two sequence rules I'm getting matches of length 2, not of arbitrary length until the stop clause is met.
GET /rabt-rainfall-*/_eql/search
{
"timestamp_field": "#timestamp",
"event_category_field": "type",
"size": 100,
"query": """
sequence
[ rainfall where `rain-last-10-mins` > 0 ]
[ rainfall where `rain-last-10-mins` > 0 ]
until [ rainfall where `rain-last-10-mins` == 0 ]
"""
}

This would definitely be a job for EQL which allows you to return sequences of related data (ordered in time and matching some constraints):
GET /rabt-rainfall-2021.03.11/_eql/search?filter_path=-hits.events
{
"timestamp_field": "#timestamp",
"event_category_field": "type",
"size": 100,
"query": """
sequence with maxspan=12h
[ rainfall where `rain-last-10-mins` > 0 ]
until `rain-last-10-mins` == 0
"""
}
What the above query seeks to do is basically this:
get me the sequence of events of type rainfall
with rain-last-10-mins > 0
happening within a 12h window
up until rain-last-10-mins drops to 0
The until statement makes sure that the sequence "expires" as soon as an event has rain-last-10-mins: 0 within the given time window.
In the response, you're going to get the number of matching events in hits.total.value and if that number is 72 (because the time window is limited to 12h), then you know you have a matching sequence.
So your "storm" signal here is to detect whether the above query returns hits.total.value: 72 or lower.
Disclaimer: I haven't tested this, but in theory it should work the way I described.

How to force segments merge in Elasticsearch 5?

I'm working with Elasticsearch 5.2.2 and I would like to fully merge the segments of my index after an intensive indexing operation.
I'm using the following rest API in order to merge all the segments:
http://localhost:9200/my_index/_forcemerge
(I've tried also to add max_num_segments=1 in the POST request.)
And ES replies with:
{
"_shards": {
"total": 16,
"successful": 16,
"failed": 0
}
}
Note that my_index is composed by 16 shards.
But when I ask for node stats (http://localhost:9200/_nodes/stats) it replies with:
segments: {
count: 64,
[...]
}
So it seems that all the shards are split into 4 segments (64/16 = 4). In fact, an "ls" on the data directory confirms that there are 4 segments per shards:
~# ls /var/lib/elasticsearch/nodes/0/indices/ym_5_99nQrmvTlR_2vicDA/0/index/
_0.cfe _0.cfs _0.si _1.cfe _1.cfs _1.si _2.cfe _2.cfs _2.si _5.cfe _5.cfs _5.si segments_6 write.lock
And no concurrent merges are running (http://localhost:9200/_nodes/stats):
merges: {
current: 0,
[...]
}
And all the force_merge requests have been completed (http://localhost:9200/_nodes/stats):
force_merge: {
threads: 1,
queue: 0,
active: 0,
rejected: 0,
largest: 1,
completed: 3
}
I hadn't this problem with ES 2.2.
Who knows how to fully merge these segments?
Thank you all!

I am not sure whether your problem solved. just post here to let other people know.
this should be a bug. you can see following issue. use empty json body can make it work.
https://github.com/TravisTX/elasticsearch-head-chrome/issues/16

Elasticsearch: Inconsistent number of shards in stats & cluster APIs

I uploaded the data to my single node cluster and named the index as 'gequest'.
When I GET from http://localhost:9200/_cluster/stats?human&pretty, I get:
"cluster_name" : "elasticsearch",
"status" : "yellow",
"indices" : {
"count" : 1,
"shards" : {
"total" : 5,
"primaries" : 5,
"replication" : 0.0,
"index" : {
"shards" : {
"min" : 5,
"max" : 5,
"avg" : 5.0
},
"primaries" : {
"min" : 5,
"max" : 5,
"avg" : 5.0
},
"replication" : {
"min" : 0.0,
"max" : 0.0,
"avg" : 0.0
}
}
}
When I do GET on http://localhost:9200/_stats?pretty=true
"_shards" : {
"total" : 10,
"successful" : 5,
"failed" : 0
}
How come total number of shards not consistent in two reports? Why total shards are 10 from stats API. How to track the other 5?

From the results it is likely that you have a single elasticsearch node running and created a index with default values(which creates 5 shards and one replica). Since there is only one node running elasticsearch is unable to assign the replica shards anywhere(elasticsearch will never assign the primary and replica of the same shard in a single node).
The _cluster/stats API gives information about the cluster including the current state. From your result it is seen that the cluster state is "yellow" indicating that all the primary shards are allocated but not all replicas have been allocated/initialized. So it is showing only the allocated shards as 5.
The _stats API gives information about your indices in the cluster. It will give information about how many shards the index will have and how many replicas. Since your index needs a total of 10 shards (5 primary and 5 replica as specified when you create the index) the stats contain information as total 10, successful 5 and failed 5(failed because unable to allocate in any node).
Use http://localhost:9200/_cat/shards to see the overall shard status

ElasticSearch return 4 primary shards instead of 3

I created an index with 3 primarys shards but ElasticSearch return and indicate that there are 4 primary shards.
# Deleting the data in the cluster
DELETE /_all
# Create an index with 3 primary shards with 1 replica each
PUT /blogs
{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
# Retrieve the cluster health
GET /_cluster/health
And here is the response :
{
"cluster_name": "clus",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 4,
"active_shards": 4,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 4
}
I thought that there was only the number of replica shards that could change and that the number of primary shards was fixed at the time that the index was created, where does come from the fourth primary shard ?

Since your example shows thing in marvel syntax, I assume you are using marvel. It will create an index for it's own data (so that's the 4th shard you are seeing). Try GET /_cat/shards to see this.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elastic search shards - total, successful, failed - elasticsearch

Related

Elasticsearch showing yellow health for logstash index

Find sequences in time series data using Elasticsearch

How to force segments merge in Elasticsearch 5?

Elasticsearch: Inconsistent number of shards in stats & cluster APIs

ElasticSearch return 4 primary shards instead of 3

Categories

Resources