Elasticsearch: what does "shard allocation" mean? - elasticsearch

We encountered a production incident, that Elasticsearch cluster health check returned red status. The health check report shows marvel-2019.06.20 has 2 unassigned_shards, which seems the root cause.
curl -XGET 'localhost:9200/_cluster/health?level=indices&pretty'
{
"cluster_name" : "sap-jam-jam8",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 122,
"active_shards" : 239,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 7,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"indices" : {
...
...
".marvel-2019.06.20" : {
"status" : "red",
"number_of_shards" : 1,
"number_of_replicas" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 2
}
}
we checked the config of Elasticseach, found cluster.routing.allocation has been disabled.
curl -XGET 'localhost:9200/_cluster/settings?pretty'
{
"persistent" : { },
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"enable" : "none"
}
}
}
}
}
As this stackoverflow post suggested, we forced a shard to be assigned, and this issue has gone.
curl -XPOST -d '{ "commands" : [ {
"allocate" : {
"index" : ".marvel-2014.05.21",
"shard" : 0,
"node" : "SOME_NODE_HERE",
"allow_primary":true
}
} ] }' http://localhost:9200/_cluster/reroute?pretty
After resolved this incident, I think it's necessary to figure out the basic concept shard allocation. I did some research, but the following questions are still confusing for me.
1. Why elasticsearch needs to assign shard to other nodes?
In my case, we have two elasticsearch nodes, A and B. Two shards have already been created in A, and consumed disk space.
When B is not available, why not just active those two shards in server A?
At least it return a yellow health status.
2. What's the procedures of assign a shard?
In the first question, we suppose both primary shard and replica has been created in server A. when saying assign shard to B, what does that mean?
Doest that mean copy shard from server A to server B?
3. How to explain this zero active shard?
Both primary shard and replicate has been created, but are not active. How is it possible? Besides disk storage, is there other overhead to activate a shard? e.g. Memory?
".marvel-2019.06.20" : {
"status" : "red",
"number_of_shards" : 1,
"number_of_replicas" : 1,
"active_primary_shards" : 0,
"active_shards" : 0, // both shards are inactive.
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 2
}
4. Is the following assumption true?
To make a shard active, Elasticsearch need do the following steps:
Create a shard.
Find a server, which has enough disk space and RAM to run it.
copy this shard from source server to destination server.
Activate this shard.
Reference
Elasticsearch blog: Red Elasticsearch Cluster? Panic no longer
Stack overflow: elasticsearch - what to do with unassigned shards

I'm no expert but are some thoughts:
You have 2 node cluster. ES will try to allocate shards on both nodes to provide HA in case that one of the node fails (and even for better read/write performance)
Related to 1. ES tries to allocate shards accros nodes to provide HA. If both shards are allocated on node A and you add node B to cluster ES will try to move replica shards to B.
Only index is created, not shards. So it tells you "ok, I've managed to create index but I can't find where to place its data". You can find why by running cat shards (https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html). Shards do cost memory as theirs metadata has to be cached.
That's probably procedure for adding replica shards with possible few additional conditions (like try to create replica on node which doesn't contain primary shard and which has least load and can contain data...)

Related

AWS Elasticsearch showing cluster health yellow, how should I fix it?

I am using AWS Elasticsearch. My cluster status is yellow for past 48 hours on the recommendation provided here:
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-handling-errors.html
I've updated my nodes to be 15 data and it has 3 master nodes.
Even though it has more spaces for around 60 Gb in each nodes , it is still in yellow state.
When i executed this command GET /_cluster/allocation/explain
"index" : "***********************************",
"shard" : 4,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2020-10-09T16:19:41.803Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [f6hB7EYOSR-GiJLFXBn01w]: failed recovery, failure RecoveryFailedException[[******************************][4]: Recovery failed from {70c36ff18063566c3a6089f3d696440a}{*******************}{*************}{di}{di_number=39, zone=us-east-1d, distributed_snapshot_deletion_enabled=true} into {**********************}{****************}{*************}{*****}{*******}{di}{distributed_snapshot_deletion_enabled=true, zone=us-east-1d, di_number=39}]; nested: RemoteTransportException[[****************][*********][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [1554462628/1.4gb], which is larger than the limit of [1513521152/1.4gb], real usage: [1554460888/1.4gb], new bytes reserved: [1740/1.6kb], usages [request=0/0b, fielddata=621718551/592.9mb, in_flight_requests=73378/71.6kb, accounting=35794764/34.1mb]]; ",
"last_allocation_status" : "no_attempt"
}
This is what it says. How can i resolve this?

Elasticsearch cluster health intermittently flaps between 'GREEN' and 'YELLOW'

We are running a 7 node cluster with "ZERO" replicas, like this:
{
"cluster_name": "my_cluster",
"status": "green",
"timed_out": false,
"number_of_nodes": 7,
"number_of_data_nodes": 7,
"active_primary_shards": 3325,
"active_shards": 3325,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}
elasticsearch cluster state changes from "Green" to "Yellow" intermittently. The other interesting thing I noticed was during this intermittent cluster state changes, there is shard initializing taking place, which correlates with the cluster state changes. Is this due to the cluster running with "ZERO" replicas?
What could cause the above behavior ?
1.find that indices with
http://IP_MASTER:9200/_cat/indices?v
2.find the node that has the shard of that indices is going assign and unassigned.
http://IP_MASTER:9200/_cat/shards?v
restart service elasticsearch on that node
if problem exist you have two option.
A. lucene check index (just check that shard)
java -cp lucene-core*.jar -ea:org.apache.luceneā€¦ org.apache.lucene.index.CheckIndex /mnt/nas/elasticsearch/graylog-production/nodes/0/indices/graylog_92/0/index/ -verbose -exorcise
if it say doesn't find the segment, try to find and cd on that path and run the command.
B. elasticsearch fix index (it check all index and is very slow)
index.shard.check_on_startup: fix
you should set this config on elasticsearch.yml of that node.

HDFS Visulization of block distribution

I'm trying to create a visulaization of the HDFS block distribution of a cluster.
I plan to create this using Tableau but was wondering what type of visualizations would be able to give you an idea of what nodes need re-balancing, and also an efficient way to get the server log data into tableau?
Before investing too much time in this, you might want to take a look at Twitter's open source HDFS-DU project. This provides a view of utilization based on paths within the file system rather than DataNodes within the cluster, but perhaps that's still helpful for your requirements.
If the goal is just to identify nodes in need of rebalancing, then this information is already accessible on the NameNode web UI "Datanodes" tab. You could also run hdfs dfsadmin -report to get utilization stats for each node in a script.
If none of the above meets your requirements, and you need to proceed with integrating the information into an external reporting tool like Tableau, then a helpful integration point might be the JMX metrics exposed via HTTP on the NameNode. See below for an example curl command that queries some of this information from the NameNode. Note in particular the LiveNodes section, which contains capacity information about each DataNode.
Some additional information about these metrics is available in the Apache Hadoop Metrics documentation.
> curl 'http://127.0.0.1:9870/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo'
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeInfo",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
"Threads" : 46,
"Version" : "3.0.0-alpha2-SNAPSHOT, rdf497b3a739714c567c9c2322608f0659da20cc4",
"Used" : 5263360,
"Free" : 884636377088,
"Safemode" : "",
"NonDfsUsedSpace" : 114431086592,
"PercentUsed" : 5.266863E-4,
"BlockPoolUsedSpace" : 5263360,
"PercentBlockPoolUsed" : 5.266863E-4,
"PercentRemaining" : 88.52252,
"CacheCapacity" : 0,
"CacheUsed" : 0,
"TotalBlocks" : 50,
"NumberOfMissingBlocks" : 0,
"NumberOfMissingBlocksWithReplicationFactorOne" : 0,
"LiveNodes" : "{\"192.168.0.117:9866\":{\"infoAddr\":\"127.0.0.1:9864\",\"infoSecureAddr\":\"127.0.0.1:0\",\"xferaddr\":\"127.0.0.1:9866\",\"lastContact\":2,\"usedSpace\":5263360,\"adminState\":\"In Service\",\"nonDfsUsedSpace\":114431086592,\"capacity\":999334871040,\"numBlocks\":50,\"version\":\"3.0.0-alpha2-SNAPSHOT\",\"used\":5263360,\"remaining\":884636377088,\"blockScheduled\":0,\"blockPoolUsed\":5263360,\"blockPoolUsedPercent\":5.266863E-4,\"volfails\":0}}",
"DeadNodes" : "{}",
"DecomNodes" : "{}",
"BlockPoolId" : "BP-1429209999-10.195.15.240-1484933797029",
"NameDirStatuses" : "{\"active\":{\"/Users/naurc001/hadoop-deploy-trunk/data/dfs/name\":\"IMAGE_AND_EDITS\"},\"failed\":{}}",
"NodeUsage" : "{\"nodeUsage\":{\"min\":\"0.00%\",\"median\":\"0.00%\",\"max\":\"0.00%\",\"stdDev\":\"0.00%\"}}",
"NameJournalStatus" : "[{\"manager\":\"FileJournalManager(root=/Users/naurc001/hadoop-deploy-trunk/data/dfs/name)\",\"stream\":\"EditLogFileOutputStream(/Users/naurc001/hadoop-deploy-trunk/data/dfs/name/current/edits_inprogress_0000000000000000862)\",\"disabled\":\"false\",\"required\":\"false\"}]",
"JournalTransactionInfo" : "{\"MostRecentCheckpointTxId\":\"861\",\"LastAppliedOrWrittenTxId\":\"862\"}",
"NNStartedTimeInMillis" : 1485715900031,
"CompileInfo" : "2017-01-03T21:06Z by naurc001 from trunk",
"CorruptFiles" : "[]",
"NumberOfSnapshottableDirs" : 0,
"DistinctVersionCount" : 1,
"DistinctVersions" : [ {
"key" : "3.0.0-alpha2-SNAPSHOT",
"value" : 1
} ],
"SoftwareVersion" : "3.0.0-alpha2-SNAPSHOT",
"NameDirSize" : "{\"/Users/naurc001/hadoop-deploy-trunk/data/dfs/name\":2112351}",
"RollingUpgradeStatus" : null,
"ClusterId" : "CID-4526ea43-52e6-4b3f-9ddf-5fd4412e322e",
"UpgradeFinalized" : true,
"Total" : 999334871040
} ]
}

Courier Fetch: shards failed

Why do I get these warnings after adding more data to my elasticsearch?
And the warnings are different every time I browse the dashboard.
"Courier Fetch: 30 of 60 shards failed."
More details:
It's a sole node on a CentOS 7.1
/etc/elasticsearch/elasticsearch.yml
index.number_of_shards: 3
index.number_of_replicas: 1
bootstrap.mlockall: true
threadpool.bulk.queue_size: 1000
indices.fielddata.cache.size: 50%
threadpool.index.queue_size: 400
index.refresh_interval: 30s
index.number_of_shards: 5
index.number_of_replicas: 1
/usr/share/elasticsearch/bin/elasticsearch.in.sh
ES_HEAP_SIZE=3G
#I use this Garbage Collector instead of the default one.
JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC"
cluster status
{
"cluster_name" : "my_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 61,
"active_shards" : 61,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 61
}
cluster details
{
"cluster_name" : "my_cluster",
"nodes" : {
"some weird number" : {
"name" : "ES 1",
"transport_address" : "inet[localhost/127.0.0.1:9300]",
"host" : "some host",
"ip" : "150.244.58.112",
"version" : "1.4.4",
"build" : "c88f77f",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 7854,
"max_file_descriptors" : 65535,
"mlockall" : false
}
}
}
}
I'm curious about the "mlockall" : false because on the yml I did write bootstrap.mlockall: true
logs
lots of lines like:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23#a9a34f5
For me tuning the threadpool search queue_size solved the issue. I tried a number of other things and this is the one that solved it.
I added this to my elasticsearch.yml
threadpool.search.queue_size: 10000
and then restarted elasticsearch.
Reasoning... (from the docs)
A node holds several thread pools in order to improve how threads
memory consumption are managed within a node. Many of these pools also
have queues associated with them, which allow pending requests to be
held instead of discarded.
and for search in particular...
For count/search operations. Defaults to fixed with a size of int((#
of available_processors * 3) / 2) + 1, queue_size of 1000.
For more information you can refer to the elasticsearch docs here...
I had trouble finding this information so I hope this helps others!
I got this error when my query was missing a closing quote:
field:"value
In my ElasticSearch logs I see these exceptions:
Caused by: org.elasticsearch.index.query.QueryShardException:
Failed to parse query [field:"value]
...
Caused by: org.apache.lucene.queryparser.classic.ParseException:
Cannot parse 'field:"value': Lexical error at line 1, column 13.
Encountered: <EOF> after : "\"value"
Using Elasticsearch 5.4 thread_pool has an underscore it it.
thread_pool.search.queue_size: 10000
See documentation at Elasticsearch Thread Pool module documentation
This is likely an indication that there's a problem with your cluster's health. Without knowing more about your cluster, there's not much more that can be said.
I agree with #Philip's opinion, But it's necessary to restart elasticsearch at least on Elasticsearch >=1.5.2, because you can dynamically set threadpool.search.queue_size.
curl -XPUT http://your_es:9200/_cluster/settings
{
"transient":{
"threadpool.search.queue_size":10000
}
}
from Elasticsearch >= version 5, its not possible to update cluster settings for thread_pool.search.queue_size using _cluster/settings API. In my case updating ElasticSearch Node yml file is not an option either since if node fails then auto scaling code would bring other ES node with default yml settings.
I have a cluster with 3 nodes and having 400 active primary shards with 7 active threads for queue size of 1000. Increasing number of nodes to 5 with similar config has resolved the issue as queries are getting distributed horizontally to more available nodes.
this will not work on elasticsearch 5.6.
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[colmbmiscxx.xx][172.29.xx.xx:9300][cluster:admin/settings/update]"
}
],
"type": "illegal_argument_exception",
"reason": "transient setting [threadpool.search.queue_size], not dynamically updateable"
},
"status": 400
}

How do I find out whats wrong with elasticsearch replication when the status is red

I'm seeing very high CPU on my two elasticsearch nodes and profiling shows that its associated with elasticsearch replication.
I've executed the health status command:
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
which returns this:
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 2003,
"active_shards" : 4006,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 34
}
I can see the status is red so there is some sort of problem which presumably is resulting in the high cpu.
But how do i find out what is actually wrong to i can rectify it?
I'd start with the debugging approach outlined here:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_cluster_health.html
To recap, try running:
GET _cluster/health?level=indices
This will give you info about which indices are having issues - but in general a Red status means you have missing primary and replica shards which means you have missing data (not good).
I'd take a look at individual node health:
GET _nodes/stats
From there I'd focus on heap (memory) usage and disk usage - in particular look for a full disk. And then I'd be logging into each node separately to check disk usage.

Resources