Elasticsearch Debugging - elasticsearch

Our elasticsearch is a mess. The cluster health is always in red and ive decided to look into it and salvage it if possible. But I have no idea where to begin with. Here is some info regarding our cluster:
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 6,
"active_primary_shards" : 91,
"active_shards" : 91,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 201,
"number_of_pending_tasks" : 0
}
The 6 nodes:
host ip heap.percent ram.percent load node.role master name
es04e.p.comp.net 10.0.22.63 30 22 0.00 d m es04e-es
es06e.p.comp.net 10.0.21.98 20 15 0.37 d m es06e-es
es08e.p.comp.net 10.0.23.198 9 44 0.07 d * es08e-es
es09e.p.comp.net 10.0.32.233 62 45 0.00 d m es09e-es
es05e.p.comp.net 10.0.65.140 18 14 0.00 d m es05e-es
es07e.p.comp.net 10.0.11.69 52 45 0.13 d m es07e-es
Straight away you can see I have a very large number of unassigned shards (201). I came across this answer and tried it and got 'acknowledged:true', but there was no change in the either of the above posted sets of info.
Next I logged into one of the nodes es04 and went through the log files. the first log file has a few lines that caught my attention
[2015-05-21 19:44:51,561][WARN ][transport.netty ] [es04e-es] exception caught on transport layer [[id: 0xbceea4eb]], closing connection
and
[2015-05-26 15:14:43,157][INFO ][cluster.service ] [es04e-es] removed {[es03e-es][R8sz5RWNSoiJ2zm7oZV_xg][es03e.p.sojern.net][inet[/10.0.2.16:9300]],}, reason: zen-disco-receive(from master [[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]]])
[2015-05-26 15:22:28,721][INFO ][cluster.service ] [es04e-es] removed {[es02e-es][XZ5TErowQfqP40PbR-qTDg][es02e.p.sojern.net][inet[/10.0.2.229:9300]],}, reason: zen-disco-receive(from master [[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]]])
[2015-05-26 15:32:00,448][INFO ][discovery.ec2 ] [es04e-es] master_left [[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]]], reason [shut_down]
[2015-05-26 15:32:00,449][WARN ][discovery.ec2 ] [es04e-es] master left (reason = shut_down), current nodes: {[es07e-es][etJN3eOySAydsIi15sqkSQ][es07e.p.sojern.net][inet[/10.0.2.69:9300]],[es04e-es][3KFMUFvzR_CzWRddIMdpBg][es04e.p.sojern.net][inet[/10.0.1.63:9300]],[es05e-es][ZoLnYvAdTcGIhbcFRI3H_A][es05e.p.sojern.net][inet[/10.0.1.140:9300]],[es08e-es][FPa4q07qRg-YA7hAztUj2w][es08e.p.sojern.net][inet[/10.0.2.198:9300]],[es09e-es][4q6eACbOQv-TgEG0-Bye6w][es09e.p.sojern.net][inet[/10.0.2.233:9300]],[es06e-es][zJ17K040Rmiyjf2F8kjIiQ][es06e.p.sojern.net][inet[/10.0.1.98:9300]],}
[2015-05-26 15:32:00,450][INFO ][cluster.service ] [es04e-es] removed {[es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]],}, reason: zen-disco-master_failed ([es01e-es][JzkWq9qwQSGdrWpkOYvbqQ][es01e.p.sojern.net][inet[/10.0.2.237:9300]])
[2015-05-26 15:32:36,741][INFO ][cluster.service ] [es04e-es] new_master [es04e-es][3KFMUFvzR_CzWRddIMdpBg][es04e.p.sojern.net][inet[/10.0.1.63:9300]], reason: zen-disco-join (elected_as_master)
In this section i realized there were a few nodes es01, es02, es03 which were deleted.
After this, all log files(around 30 of them) have only 1 line:
[2015-05-26 15:43:49,971][DEBUG][action.bulk ] [es04e-es] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
I have checked all the nodes and they have same version of ES and logstash. I realize this is a big complicated issues but if anyone can find out the issue and nudge me in the right direction it will be HUGE help

I believe this might be because at some point you have a split brain issue and there were 2 versions of same shard in 2 clusters. One or both might have got different sets of data and 2 versions of shard might have come into existence. At some point you might have restarted the whole system and some shards might have gone to red state.
First see if there is data loss , if there is , aforementioned case could be the reason. Next make sure you set minimum master nodes to N/2+1 ( N is the number of shards ) , so that this issue wont surface again.
YOu can use the shard reroute API on the red shards and see if its moving out of red state. You might loose the shard data here , but then that is the the only way i have seen to being back the cluster state to green.

Please try to install Elastic-head plugin to check, to check shard status. you will able to see which shards are corrupted.
Try flush or optimize option.
Also restart Elastic sometime works.

Related

Elasticsearch 7.4 incorrectly complaining a snapshot is already running

After solving Something inside Elasticsearch 7.4 cluster is getting slower and slower with read timeouts now and then there is still something off in my cluster. Whenever I run the snapshot command it gives me a 503, when I run it one or two times again it suddenly starts and creates a snapshot just fine. The opster.com online tool suggests something about snapshots not being configured, however when I run the verify command suggested by it, everything looks fine.
$ curl -s -X POST 'http://127.0.0.1:9201/_snapshot/elastic_backup/_verify?pretty'
{
"nodes" : {
"JZHgYyCKRyiMESiaGlkITA" : {
"name" : "elastic7-1"
},
"jllZ8mmTRQmsh8Sxm8eDYg" : {
"name" : "elastic7-4"
},
"TJJ_eHLIRk6qKq_qRWmd3w" : {
"name" : "elastic7-3"
},
"cI-cn4V3RP65qvE3ZR8MXQ" : {
"name" : "elastic7-2"
}
}
}
But then:
curl -s -X PUT 'http://127.0.0.1:9201/_snapshot/elastic_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E?wait_for_completion=true&pretty'
{
"error" : {
"root_cause" : [
{
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[elastic_backup:snapshot-2020.11.27] a snapshot is already running"
}
],
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[elastic_backup:snapshot-2020.11.27] a snapshot is already running"
},
"status" : 503
}
Could it be that one of the 4 nodes is in the believe that a snapshot is already running, and that this task randomly gets assigned to one of the nodes so that when running it a few times eventually it will make a snapshot? If so, how could I figure out which of the nodes is saying the snapshot is already running?
Furthermore I noticed heap is much higher on one of the nodes, what is a normal heap usage?
$ curl -s http://127.0.0.1:9201/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.0.1.215 59 99 7 0.38 0.38 0.36 dilm - elastic7-1
10.0.1.218 32 99 1 0.02 0.17 0.22 dilm * elastic7-4
10.0.1.212 11 99 1 0.04 0.17 0.21 dilm - elastic7-3
10.0.1.209 36 99 3 0.42 0.40 0.36 dilm - elastic7-2
Last night it happened again while I’m sure nothing was already snapshotting and so now I ran the following commands to confirm the weird response, at least I would not expect to get this error at this point.
$ curl http://127.0.0.1:9201/_snapshot/elastic_backup/_current?pretty
{
"snapshots" : [ ]
}
$ curl -s -X PUT 'http://127.0.0.1:9201/_snapshot/elastic_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E?wait_for_completion=true&pretty'
{
"error" : {
"root_cause" : [
{
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[elastic_backup:snapshot-2020.12.03] a snapshot is already running"
}
],
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[elastic_backup:snapshot-2020.12.03] a snapshot is already running"
},
"status" : 503
}
When I run it a 2nd (or sometimes 3rd) time it will all of a sudden be creating a snapshot.
And note that when I don't run it that 2nd or 3rd times the snapshot will never appear, so I'm 100% sure no snapshot is running at the moment of this error.
There is no SLM configured as far as I know:
{ }
The repo is configured properly AFAICT:
$ curl http://127.0.0.1:9201/_snapshot/elastic_backup?pretty
{
"elastic_backup" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "elastic_backup"
}
}
}
Also in the config it is mapped to the same folder that is an NFS mount of an Amazon EFS. It is available and accessible and on successful snapshots shows new data.
As part of the cronjob I have added to query _cat/tasks?v, so hopefully tonight we will see more. Because just now when I ran the command manually it ran without problems:
$ curl localhost:9201/_cat/tasks?v ; curl -s -X PUT 'http://127.0.0.1:9201/_snapshot/elastic_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E?wait_for_completion=true&pretty' ; curl localhost:9201/_cat/tasks?v
action task_id parent_task_id type start_time timestamp running_time ip node
cluster:monitor/tasks/lists JZHgYyCKRyiMESiaGlkITA:15885091 - transport 1607068277045 07:51:17 209.6micros 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists[n] TJJ_eHLIRk6qKq_qRWmd3w:24278976 JZHgYyCKRyiMESiaGlkITA:15885091 transport 1607068277044 07:51:17 62.7micros 10.0.1.212 elastic7-3
cluster:monitor/tasks/lists[n] JZHgYyCKRyiMESiaGlkITA:15885092 JZHgYyCKRyiMESiaGlkITA:15885091 direct 1607068277045 07:51:17 57.4micros 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists[n] jllZ8mmTRQmsh8Sxm8eDYg:23773565 JZHgYyCKRyiMESiaGlkITA:15885091 transport 1607068277045 07:51:17 84.7micros 10.0.1.218 elastic7-4
cluster:monitor/tasks/lists[n] cI-cn4V3RP65qvE3ZR8MXQ:3418325 JZHgYyCKRyiMESiaGlkITA:15885091 transport 1607068277046 07:51:17 56.9micros 10.0.1.209 elastic7-2
{
"snapshot" : {
"snapshot" : "snapshot-2020.12.04",
"uuid" : "u2yQB40sTCa8t9BqXfj_Hg",
"version_id" : 7040099,
"version" : "7.4.0",
"indices" : [
"log-db-1-2020.06.18-000003",
"log-db-2-2020.02.19-000002",
"log-db-1-2019.10.25-000001",
"log-db-3-2020.11.23-000002",
"log-db-3-2019.10.25-000001",
"log-db-2-2019.10.25-000001",
"log-db-1-2019.10.27-000002"
],
"include_global_state" : true,
"state" : "SUCCESS",
"start_time" : "2020-12-04T07:51:17.085Z",
"start_time_in_millis" : 1607068277085,
"end_time" : "2020-12-04T07:51:48.537Z",
"end_time_in_millis" : 1607068308537,
"duration_in_millis" : 31452,
"failures" : [ ],
"shards" : {
"total" : 28,
"failed" : 0,
"successful" : 28
}
}
}
action task_id parent_task_id type start_time timestamp running_time ip node
indices:data/read/search JZHgYyCKRyiMESiaGlkITA:15888939 - transport 1607068308987 07:51:48 2.7ms 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists JZHgYyCKRyiMESiaGlkITA:15888942 - transport 1607068308990 07:51:48 223.2micros 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists[n] TJJ_eHLIRk6qKq_qRWmd3w:24282763 JZHgYyCKRyiMESiaGlkITA:15888942 transport 1607068308989 07:51:48 61.5micros 10.0.1.212 elastic7-3
cluster:monitor/tasks/lists[n] JZHgYyCKRyiMESiaGlkITA:15888944 JZHgYyCKRyiMESiaGlkITA:15888942 direct 1607068308990 07:51:48 78.2micros 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists[n] jllZ8mmTRQmsh8Sxm8eDYg:23777841 JZHgYyCKRyiMESiaGlkITA:15888942 transport 1607068308990 07:51:48 63.3micros 10.0.1.218 elastic7-4
cluster:monitor/tasks/lists[n] cI-cn4V3RP65qvE3ZR8MXQ:3422139 JZHgYyCKRyiMESiaGlkITA:15888942 transport 1607068308991 07:51:48 60micros 10.0.1.209 elastic7-2
Last night (2020-12-12) during cron I have had it run the following commands:
curl localhost:9201/_cat/tasks?v
curl localhost:9201/_cat/thread_pool/snapshot?v
curl -s -X PUT 'http://127.0.0.1:9201/_snapshot/elastic_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E?wait_for_completion=true&pretty'
curl localhost:9201/_cat/tasks?v
sleep 1
curl localhost:9201/_cat/thread_pool/snapshot?v
curl -s -X PUT 'http://127.0.0.1:9201/_snapshot/elastic_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E?wait_for_completion=true&pretty'
sleep 1
curl -s -X PUT 'http://127.0.0.1:9201/_snapshot/elastic_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E?wait_for_completion=true&pretty'
sleep 1
curl -s -X PUT 'http://127.0.0.1:9201/_snapshot/elastic_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E?wait_for_completion=true&pretty'
And the output for it is following:
action task_id parent_task_id type start_time timestamp running_time ip node
cluster:monitor/tasks/lists JZHgYyCKRyiMESiaGlkITA:78016838 - transport 1607736001255 01:20:01 314.4micros 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists[n] TJJ_eHLIRk6qKq_qRWmd3w:82228580 JZHgYyCKRyiMESiaGlkITA:78016838 transport 1607736001254 01:20:01 66micros 10.0.1.212 elastic7-3
cluster:monitor/tasks/lists[n] jllZ8mmTRQmsh8Sxm8eDYg:55806094 JZHgYyCKRyiMESiaGlkITA:78016838 transport 1607736001255 01:20:01 74micros 10.0.1.218 elastic7-4
cluster:monitor/tasks/lists[n] JZHgYyCKRyiMESiaGlkITA:78016839 JZHgYyCKRyiMESiaGlkITA:78016838 direct 1607736001255 01:20:01 94.3micros 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists[n] cI-cn4V3RP65qvE3ZR8MXQ:63582174 JZHgYyCKRyiMESiaGlkITA:78016838 transport 1607736001255 01:20:01 73.6micros 10.0.1.209 elastic7-2
node_name name active queue rejected
elastic7-2 snapshot 0 0 0
elastic7-4 snapshot 0 0 0
elastic7-1 snapshot 0 0 0
elastic7-3 snapshot 0 0 0
{
"error" : {
"root_cause" : [
{
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[elastic_backup:snapshot-2020.12.12] a snapshot is already running"
}
],
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[elastic_backup:snapshot-2020.12.12] a snapshot is already running"
},
"status" : 503
}
action task_id parent_task_id type start_time timestamp running_time ip node
cluster:monitor/nodes/stats JZHgYyCKRyiMESiaGlkITA:78016874 - transport 1607736001632 01:20:01 39.6ms 10.0.1.215 elastic7-1
cluster:monitor/nodes/stats[n] TJJ_eHLIRk6qKq_qRWmd3w:82228603 JZHgYyCKRyiMESiaGlkITA:78016874 transport 1607736001631 01:20:01 39.2ms 10.0.1.212 elastic7-3
cluster:monitor/nodes/stats[n] jllZ8mmTRQmsh8Sxm8eDYg:55806114 JZHgYyCKRyiMESiaGlkITA:78016874 transport 1607736001632 01:20:01 39.5ms 10.0.1.218 elastic7-4
cluster:monitor/nodes/stats[n] cI-cn4V3RP65qvE3ZR8MXQ:63582204 JZHgYyCKRyiMESiaGlkITA:78016874 transport 1607736001632 01:20:01 39.4ms 10.0.1.209 elastic7-2
cluster:monitor/nodes/stats[n] JZHgYyCKRyiMESiaGlkITA:78016875 JZHgYyCKRyiMESiaGlkITA:78016874 direct 1607736001632 01:20:01 39.5ms 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists JZHgYyCKRyiMESiaGlkITA:78016880 - transport 1607736001671 01:20:01 348.9micros 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists[n] JZHgYyCKRyiMESiaGlkITA:78016881 JZHgYyCKRyiMESiaGlkITA:78016880 direct 1607736001671 01:20:01 188.6micros 10.0.1.215 elastic7-1
cluster:monitor/tasks/lists[n] TJJ_eHLIRk6qKq_qRWmd3w:82228608 JZHgYyCKRyiMESiaGlkITA:78016880 transport 1607736001671 01:20:01 106.2micros 10.0.1.212 elastic7-3
cluster:monitor/tasks/lists[n] cI-cn4V3RP65qvE3ZR8MXQ:63582209 JZHgYyCKRyiMESiaGlkITA:78016880 transport 1607736001672 01:20:01 96.3micros 10.0.1.209 elastic7-2
cluster:monitor/tasks/lists[n] jllZ8mmTRQmsh8Sxm8eDYg:55806120 JZHgYyCKRyiMESiaGlkITA:78016880 transport 1607736001672 01:20:01 97.8micros 10.0.1.218 elastic7-4
node_name name active queue rejected
elastic7-2 snapshot 0 0 0
elastic7-4 snapshot 0 0 0
elastic7-1 snapshot 0 0 0
elastic7-3 snapshot 0 0 0
{
"snapshot" : {
"snapshot" : "snapshot-2020.12.12",
"uuid" : "DgwuBxC7SWirjyVlFxBnng",
"version_id" : 7040099,
"version" : "7.4.0",
"indices" : [
"log-db-sbr-2020.06.18-000003",
"log-db-other-2020.02.19-000002",
"log-db-sbr-2019.10.25-000001",
"log-db-trace-2020.11.23-000002",
"log-db-trace-2019.10.25-000001",
"log-db-sbr-2019.10.27-000002",
"log-db-other-2019.10.25-000001"
],
"include_global_state" : true,
"state" : "SUCCESS",
"start_time" : "2020-12-12T01:20:02.544Z",
"start_time_in_millis" : 1607736002544,
"end_time" : "2020-12-12T01:20:27.776Z",
"end_time_in_millis" : 1607736027776,
"duration_in_millis" : 25232,
"failures" : [ ],
"shards" : {
"total" : 28,
"failed" : 0,
"successful" : 28
}
}
}
{
"error" : {
"root_cause" : [
{
"type" : "invalid_snapshot_name_exception",
"reason" : "[elastic_backup:snapshot-2020.12.12] Invalid snapshot name [snapshot-2020.12.12], snapshot with the same name already exists"
}
],
"type" : "invalid_snapshot_name_exception",
"reason" : "[elastic_backup:snapshot-2020.12.12] Invalid snapshot name [snapshot-2020.12.12], snapshot with the same name already exists"
},
"status" : 400
}
{
"error" : {
"root_cause" : [
{
"type" : "invalid_snapshot_name_exception",
"reason" : "[elastic_backup:snapshot-2020.12.12] Invalid snapshot name [snapshot-2020.12.12], snapshot with the same name already exists"
}
],
"type" : "invalid_snapshot_name_exception",
"reason" : "[elastic_backup:snapshot-2020.12.12] Invalid snapshot name [snapshot-2020.12.12], snapshot with the same name already exists"
},
"status" : 400
}
Also the cluster is green at the moment, management queues are not full, everything seems good.
Also there is only one repository:
curl http://127.0.0.1:9201/_cat/repositories?v
id type
elastic_backup fs
So it turned out that trouble started due to a recent upgrade to Docker 19.03.6 and going from 1x Docker Swarm manager + 4x Docker Swarm worker to 5x Docker Swarm manager + 4x Docker Swarm worker. In both instances Elastic ran on the workers. Because of this upgrade/change we were presented with a change in the number of network interfaces inside the containers. Because of this we had to had 'publish_host' in Elastic to make things work again.
To fix the problem we had to get rid of publishing the Elastic ports over the ingress network so that the additional network interfaces went away. Next we could drop the 'publish_host' setting. This made things work a bit better. But to really solve our issues we had to change the Docker Swarm deploy endpoint_mode to dnsrr so that things would not go through the Docker Swarm routing mesh.
We always already had 'Connection reset by peer' issues, but since the change this became worse and made Elasticsearch present strange issues. I guess running Elasticsearch inside a Docker Swarm (or any other Kubernetes or something) can be a tricky thing to debug.
Using tcpdump in the containers and conntrack -S on the hosts we were able to see perfectly fine connections being reset for no reason. Another solution was to have the kernel drop mismatching packets (instead of sending resets), but preventing the use of DNAT/SNAT in this instance as much as possible seemed to solve things too.
Elasticsearch version 7.4 only supports one snapshots operation at a time.
From the error it seems previously triggered snapshot was already running when you triggered a new snapshot and Elasticsearch throws concurrent_snapshot_execution_exception.
You can check list of currently running snapshot by using
GET /_snapshot/elastic_backup/_current.
I suggest you should check first if any snapshot operation is running for your elasticsearch cluster using above api. If no snapshot operation is currently running then only you should trigger new snapshot.
P.S : From Elasticsearch version 7.7 onwards elasticsearch do support concurrent snapshots as well. So if you plan to perform concurrent snapshots operation in you cluster then you should upgrade ES version 7.7 or above.

Clickhouse table stuck in read-only mode

I am new to clickhouse
So on one my system i am seeing this issue repeatedly:
"{} \u003cError\u003e void DB::AsynchronousMetrics::update(): Cannot
get replica delay for table: people: Code: 242, e.displayText() =
DB::Exception: Table is in readonly mode, Stack trace:"
And i can see my zookeeper was not in good state, So from the clickhouse docs, it seems this is related to either
Metadata in zookeeper got deleted somehow
Zookeeper was not up when clickhouse was trying to comes up.
Either way i want to recover from the error and docs suggested below steps:
a)To start recovery, create the node /path_to_table/replica_name/flags/force_restore_data in ZooKeeper
with any content, or run the command to restore all replicated tables:
sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data
Then restart the server. On start, the server deletes these flags and starts recovery.
But i am not able to understand where should i run this command, i looked inside the clickhouse container under the location /var/lib/clickhouse there is not flags directory. Should i create it first??
Also is there a way to recover from this error without restarting the server, i would rather avoid container restart??
Attaching few relevant logs before the read only exception:
2020.06.19 16:49:02.789216 [ 13 ] {} <Error> DB_0.people (ReplicatedMergeTreeRestartingThread): Couldn't start replication: Replica /clickhouse/tables/shard_0/people/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/people/replicas/replica_0732646014/is_active manually, DB::Exception: Replica /clickhouse/tables/shard_0/people/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/people/replicas/replica_0732646014/is_active manually, stack trace:
2020.06.19 16:49:13.576855 [ 17 ] {} <Error> DB_0.school (ReplicatedMergeTreeRestartingThread): Couldn't start replication: Replica /clickhouse/tables/shard_0/school/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/school/replicas/replica_0732646014/is_active manually, DB::Exception: Replica /clickhouse/tables/shard_0/school/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/school/replicas/replica_0732646014/is_active manually, stack trace:
2020.06.19 16:49:23.497824 [ 19 ] {} <Error> DB_0.people (ReplicatedMergeTreeRestartingThread): Couldn't start replication: Replica /clickhouse/tables/shard_0/people/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/people/replicas/replica_0732646014/is_active manually, DB::Exception: Replica /clickhouse/tables/shard_0/people/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/people/replicas/replica_0732646014/is_active manually, stack trace:
2020.06.19 16:49:23.665089 [ 20 ] {} <Error> DB_0.school (ReplicatedMergeTreeRestartingThread): Couldn't start replication: Replica /clickhouse/tables/shard_0/school/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/school/replicas/replica_0732646014/is_active manually, DB::Exception: Replica /clickhouse/tables/shard_0/school/replicas/replica_0732646014 appears to be already active. If you're sure it's not, try again in a minute or remove znode /clickhouse/tables/shard_0/school/replicas/replica_0732646014/is_active manually, stack trace:
2020.06.19 16:49:59.703591 [ 41 ] {} <Error> void Coordination::ZooKeeper::receiveThread(): Code: 999, e.displayText() = Coordination::Exception: Operation timeout (no response) for path: /clickhouse/tables/shard_0/school/blocks (Operation timeout), Stack trace:
2020.06.19 16:49:59.847751 [ 18 ] {} <Error> DB_0.people: void DB::StorageReplicatedMergeTree::queueUpdatingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/shard_0/people/mutations, Stack trace:
2020.06.19 16:50:00.205911 [ 19 ] {} <Warning> DB_0.school (ReplicatedMergeTreeRestartingThread): ZooKeeper session has expired. Switching to a new session.
2020.06.19 16:50:00.315063 [ 19 ] {} <Error> zkutil::EphemeralNodeHolder::~EphemeralNodeHolder(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), Stack trace:
2020.06.19 16:50:00.338176 [ 15 ] {} <Error> DB_0.people: void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), Stack trace:
2020.06.19 16:50:00.387589 [ 16 ] {} <Error> DB_0.school: void DB::StorageReplicatedMergeTree::mergeSelectingTask(): Code: 999, e.displayText() = Coordination::Exception: Connection loss, path: /clickhouse/tables/shard_0/school/log, Stack trace:
2020.06.19 16:50:00.512689 [ 17 ] {} <Error> zkutil::EphemeralNodeHolder::~EphemeralNodeHolder(): Code: 999, e.displayText() = Coordination::Exception: Session expired (Session expired), Stack trace:
2020.06.19 16:50:20.753596 [ 47 ] {} <Error> void DB::DDLWorker::runMainThread(): Code: 999, e.displayText() = Coordination::Exception: All connection tries failed while connecting to ZooKeeper. Addresses: 172.16.0.28:2181
Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Timeout: connect timed out: 172.16.0.28:2181 (version 19.13.1.11 (official build)), 172.16.0.28:2181
Code: 209, e.displayText() = DB::NetException: Timeout exceeded while reading from socket (172.16.0.28:2181): while receiving handshake from ZooKeeper (version 19.13.1.11 (official build)), 172.16.0.28:2181
Code: 209, e.displayText() = DB::NetException: Timeout exceeded while reading from socket (172.16.0.28:2181): while receiving handshake from ZooKeeper (version 19.13.1.11 (official build)), 172.16.0.28:2181
(Connection loss), Stack trace:
2020.06.19 16:50:31.499775 [ 51 ] {} <Error> void DB::AsynchronousMetrics::update(): Cannot get replica delay for table: DB_0.people: Code: 242, e.displayText() = DB::Exception: Table is in readonly mode, Stack trace:
Edit: I did manage to find the folder where flags is present(its present in my volume /repo/data) but when i try to run the command
sudo -u clickhouse touch /repo/data/flags/force_restore_data
I got this:
Use one of the following commands:
clickhouse local [args]
clickhouse client [args]
clickhouse benchmark [args]
clickhouse server [args]
clickhouse extract-from-config [args]
clickhouse compressor [args]
clickhouse format [args]
clickhouse copier [args]
clickhouse obfuscator [args]

Kafka stream app failing to fetch offsets for partition

I created a kafka cluster with 3 brokers and following details:
Created 3 topics, each one with replication factor=3 and partitions=2.
Created 2 producers each one writing to one of the topics.
Created a Streams application to process messages from 2 topics and write to the 3rd topic.
It was all running fine till now but I suddenly started getting the following warning when starting the Streams application:
[WARN ] 2018-06-08 21:16:49.188 [Stream3-4f7403ad-aba6-4d34-885d-60114fc9fcff-StreamThread-1] org.apache.kafka.clients.consumer.internals.Fetcher [Consumer clientId=Stream3-4f7403ad-aba6-4d34-885d-60114fc9fcff-StreamThread-1-restore-consumer, groupId=] Attempt to fetch offsets for partition Stream3-KSTREAM-OUTEROTHER-0000000005-store-changelog-0 failed due to: Disk error when trying to access log file on the disk.
Due to this warning, Streams application is not processing anything from the 2 topics.
I tried following things:
Stopped all brokers, deleted kafka-logs directory for each broker and restarted the brokers. It didn't solve the issue.
Stopped zookeeper and all brokers, deleted zookeeper logs as well as kafka-logs for each broker, restarted zookeeper and brokers and created the topics again. This too didn't solve the issue.
I am not able to find anything related to this error on official docs or web. Does anyone have an idea of why am I getting this error suddenly?
EDIT:
Out of 3 brokers, 2 brokers(broker-0 and broker-2) continously emit these logs:
Broker-0 logs:
[2018-06-09 02:03:08,750] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial11_topic-1 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
[2018-06-09 02:03:08,750] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial12_topic-0 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
Broker-2 logs:
[2018-06-09 02:04:46,889] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial11_topic-1 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
[2018-06-09 02:04:46,889] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial12_topic-0 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
Broker-1 shows following logs:
[2018-06-09 01:21:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-06-09 01:31:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-06-09 01:39:44,667] ERROR [KafkaApi-1] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2018-06-09 01:41:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
I again stopped zookeeper and brokers, deleted their logs and restarted. As soon as I create the topics again, I start getting the above logs.
Topic details:
[zk: localhost:2181(CONNECTED) 3] get /brokers/topics/initial11_topic
{"version":1,"partitions":{"1":[1,0,2],"0":[0,2,1]}}
cZxid = 0x53
ctime = Sat Jun 09 01:25:42 EDT 2018
mZxid = 0x53
mtime = Sat Jun 09 01:25:42 EDT 2018
pZxid = 0x54
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
[zk: localhost:2181(CONNECTED) 4] get /brokers/topics/initial12_topic
{"version":1,"partitions":{"1":[2,1,0],"0":[1,0,2]}}
cZxid = 0x61
ctime = Sat Jun 09 01:25:47 EDT 2018
mZxid = 0x61
mtime = Sat Jun 09 01:25:47 EDT 2018
pZxid = 0x62
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
[zk: localhost:2181(CONNECTED) 5] get /brokers/topics/final11_topic
{"version":1,"partitions":{"1":[0,1,2],"0":[2,0,1]}}
cZxid = 0x48
ctime = Sat Jun 09 01:25:32 EDT 2018
mZxid = 0x48
mtime = Sat Jun 09 01:25:32 EDT 2018
pZxid = 0x4a
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
Any clue?
I found out the issue. It was due to following incorrect config in server.properties of broker-1:
advertised.listeners=PLAINTEXT://10.23.152.109:9094
Mistakenly port for advertised.listeners got changed to same as port of advertised.listeners of broker-2.

Why es cluster stop to work until i delete the old index?

In es document,it introduce that,If we restart Node 1,If Node 1 still has copies of the old shards, it will try to reuse them, copying over from the primary shard only the files that have changed in the meantime.
So I did an experiment.
Here are 5 nodes in my cluster,Primary shards 1 is saved in node 1,and replica shards 1 is saved in node 2.When i restart node 1 and node 2,Primary shards 1's state become UNASSIGNED,and replica shards 1's state become UNASSIGNED too,the health of the cluster become red,and the health never become green.And the cluster stop to work until i delete the old index.
Here is part of the master log.
[ERROR][marvel.agent ] [es10] background thread had an uncaught exception
ElasticsearchException[failed to flush exporter bulks]
at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:104)
at org.elasticsearch.marvel.agent.exporter.ExportBulk.close(ExportBulk.java:53)
at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:201)
at java.lang.Thread.run(Thread.java:745)
Suppressed: ElasticsearchException[failed to flush [default_local] exporter bulk]; nested: ElasticsearchException[failure in bulk execution, only the first 100 failures are printed:
[8]: index [.marvel-es-data], type [cluster_info], id [nm4dj3ucSRGsdautV_GDDw], message [UnavailableShardsException[[.marvel-es-data][1] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-data][1]}]]]];
at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:106)
... 3 more
Caused by: ElasticsearchException[failure in bulk execution, only the first 100 failures are printed:
[8]: index [.marvel-es-data], type [cluster_info], id [nm4dj3ucSRGsdautV_GDDw], message [UnavailableShardsException[[.marvel-es-data][1] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-data][1]}]]]]
at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:114)
at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101)
... 3 more
[2016-02-19 12:53:18,769][ERROR][marvel.agent ] [es10] background thread had an uncaught exception
ElasticsearchException[failed to flush exporter bulks]
at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:104)
at org.elasticsearch.marvel.agent.exporter.ExportBulk.close(ExportBulk.java:53)
at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:201)
at java.lang.Thread.run(Thread.java:745)
Suppressed: ElasticsearchException[failed to flush [default_local] exporter bulk]; nested: ElasticsearchException[failure in bulk execution, only the first 100 failures are printed:
[8]: index [.marvel-es-data], type [cluster_info], id [nm4dj3ucSRGsdautV_GDDw], message [UnavailableShardsException[[.marvel-es-data][1] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-data][1]}]]]];
at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:106)
... 3 more
Caused by: ElasticsearchException[failure in bulk execution, only the first 100 failures are printed:
[8]: index [.marvel-es-data], type [cluster_info], id [nm4dj3ucSRGsdautV_GDDw], message [UnavailableShardsException[[.marvel-es-data][1] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-data][1]}]]]]
at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:114)
at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101)
... 3 more

Elasticsearch reboot, number_of_pending_tasks keeps increasing indefinitely

I had to restart the master elasticsearch, the status was red, then after some time the status went yellow (primary shards get assigned).
Now when I'm doing the query curl http://x.x.x.x/_cluster/health?pretty I can see that the "number_of_pending_tasks" keeps increasing (now it is at 200k)
I had a look at the pending tasks and I can see that it is mainly this tasks that get buffered:
, {
"insert_order" : 58176,
"priority" : "NORMAL",
"source" : "indices_store",
"executing" : false,
"time_in_queue_millis" : 619596,
"time_in_queue" : "10.3m"
},
In the meantime I get the error about a rejected execution due to the queue capacity:
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 200) on org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler#34c87ed9
How can I solve this?

Resources