On one on my nodes in my ElasticSearch cluster I get the following strange response:
Command:
curl -X GET 'http://localhost:9200'
Response:
{"OK":{}}
Not sure what to do about this? Anyone run into this before?
UPDATE:
This is what I get when I call (I replaced IP's with xxx):
curl -XGET localhost:9200/_nodes/jvm?human\&pretty
{
"cluster_name" : "elasticsearch",
"nodes" : {
"dtUV63D4RBq9JXw_o03-eg" : {
"name" : "elasticsearch1",
"transport_address" : "inet[xxx/xxx:9300]",
"host" : "elasticsearch1",
"ip" : "xxx",
"version" : "1.3.2",
"build" : "dee175d",
"http_address" : "inet[/xxx:9200]",
"jvm" : {
"pid" : 1471,
"version" : "1.7.0_65",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.65-b04",
"vm_vendor" : "Oracle Corporation",
"start_time" : "2014-11-19T14:50:10.408Z",
"start_time_in_millis" : 1416408610408,
"mem" : {
"heap_init" : "4gb",
"heap_init_in_bytes" : 4294967296,
"heap_max" : "3.9gb",
"heap_max_in_bytes" : 4277534720,
"non_heap_init" : "23.1mb",
"non_heap_init_in_bytes" : 24313856,
"non_heap_max" : "214mb",
"non_heap_max_in_bytes" : 224395264,
"direct_max" : "3.9gb",
"direct_max_in_bytes" : 4277534720
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
},
"8eGVx6IGQ8qiFTc4rnaG3A" : {
"name" : "elasticsearch2",
"transport_address" : "inet[/xxx:9300]",
"host" : "elasticsearch2",
"ip" : "xxx",
"version" : "1.3.2",
"build" : "dee175d",
"http_address" : "inet[/xxx:9200]",
"jvm" : {
"pid" : 1476,
"version" : "1.7.0_65",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.65-b04",
"vm_vendor" : "Oracle Corporation",
"start_time" : "2014-11-19T14:54:33.909Z",
"start_time_in_millis" : 1416408873909,
"mem" : {
"heap_init" : "4gb",
"heap_init_in_bytes" : 4294967296,
"heap_max" : "3.9gb",
"heap_max_in_bytes" : 4277534720,
"non_heap_init" : "23.1mb",
"non_heap_init_in_bytes" : 24313856,
"non_heap_max" : "214mb",
"non_heap_max_in_bytes" : 224395264,
"direct_max" : "3.9gb",
"direct_max_in_bytes" : 4277534720
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
}
}
}
Elasticsearch 1.3.2 alone is not capable of producing such response - there is simply no "OK" string in the production source code. So, I would guess somebody installed a plugin on this node that intercepts some calls and replaces them with this message.
One of the plugins that does that is elasticsearch-http-basic plugin, which indeed displays {"OK":{}} to unauthorized users instead of a full response. You can verify presence of this and other plugins by executing the following command on the node that gives you responses:
curl "localhost:9200/_nodes/plugins?pretty"
Related
I have configured my ILM to rollover when the indice size be 20GB or after passing 30 days in the hot node
but my indice passed 20GB and still didn't pass to the cold node
and when I run: GET _cat/indices?v I get:
green open packetbeat-7.9.2-2020.10.22-000001 RRAnRZrrRZiihscJ3bymig 10 1 63833049 0 44.1gb 22gb
Could you tell me how to solve that please !
Knowing that in my packetbeat file configuration, I have just changed the number of shards:
setup.template.settings:
index.number_of_shards: 10
index.number_of_replicas: 1
when I run the command GET packetbeat-7.9.2-2020.10.22-000001/_settings I get this output:
{
"packetbeat-7.9.2-2020.10.22-000001" : {
"settings" : {
"index" : {
"lifecycle" : {
"name" : "packetbeat",
"rollover_alias" : "packetbeat-7.9.2"
},
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"mapping" : {
"total_fields" : {
"limit" : "10000"
}
},
"refresh_interval" : "5s",
"number_of_shards" : "10",
"provided_name" : "<packetbeat-7.9.2-{now/d}-000001>",
"max_docvalue_fields_search" : "200",
"query" : {
"default_field" : [
"message",
"tags",
"agent.ephemeral_id",
"agent.id",
"agent.name",
"agent.type",
"agent.version",
"as.organization.name",
"client.address",
"client.as.organization.name",
and the output of the command GET /packetbeat-7.9.2-2020.10.22-000001/_ilm/explain is :
{
"indices" : {
"packetbeat-7.9.2-2020.10.22-000001" : {
"index" : "packetbeat-7.9.2-2020.10.22-000001",
"managed" : true,
"policy" : "packetbeat",
"lifecycle_date_millis" : 1603359683835,
"age" : "15.04d",
"phase" : "hot",
"phase_time_millis" : 1603359684332,
"action" : "rollover",
"action_time_millis" : 1603360173138,
"step" : "check-rollover-ready",
"step_time_millis" : 1603360173138,
"phase_execution" : {
"policy" : "packetbeat",
"phase_definition" : {
"min_age" : "0ms",
"actions" : {
"rollover" : {
"max_size" : "50gb",
"max_age" : "30d"
}
}
},
"version" : 1,
"modified_date_in_millis" : 1603359683339
}
}
}
}
It's weird that it's 50GB !!
Thanks for your help
So I found the solution of this problem.
After updating the policy, I removed the policy from the index using it, and then added it again to those index.
I'm trying to create a snapshot in s3 bucket. After running request to create the new snapshot, i'm checking the status of the new snapshot and i see that snapshot state is PARTIAL, due to RepositoryMissingException.
Why is that happening ?
More information:
snapshot configuration:
$ curl localhost:9200/_cat/repositories
s3_repository s3
creation of new snapshot:
$ curl -XPUT localhost:9200/_snapshot/s3_repository/snap10
{"accepted":true}
get details about created snapshot (here we can see the failure):
$ curl localhost:9200/_snapshot/s3_repository/snap10?pretty
{
"snapshots" : [ {
"snapshot" : "snap10",
"version_id" : 2040699,
"version" : "2.4.6",
"indices" : [ "twitter" ],
"state" : "PARTIAL",
"start_time" : "2018-09-27T08:24:13.431Z",
"start_time_in_millis" : 1538036653431,
"end_time" : "2018-09-27T08:24:13.823Z",
"end_time_in_millis" : 1538036653823,
"duration_in_millis" : 392,
"failures" : [ {
"index" : "twitter",
"shard_id" : 1,
"reason" : "RepositoryMissingException[[s3_repository] missing]",
"node_id" : "0yJw77XwSX62rUnhDAAclw",
"status" : "INTERNAL_SERVER_ERROR"
}, {
"index" : "twitter",
"shard_id" : 0,
"reason" : "RepositoryMissingException[[s3_repository] missing]",
"node_id" : "WEzVGyjXSLWuzfD_w-sBlA",
"status" : "INTERNAL_SERVER_ERROR"
} ],
"shards" : {
"total" : 2,
"failed" : 2,
"successful" : 0
}
} ]
}
Can you please assist with the issue ? why the error says that RepositoryMissingException?
Please let me know if more information is needed.
In the end the issue was that cloud-aws plugin was installed only on master node. Once I installed the plugin on the data nodes - it worked.
Image Link
on master _cluster/stats:
"fs" : {
"total_in_bytes" : 15874832596992,
"free_in_bytes" : 12578652061696,
"available_in_bytes" : 12578652061696,
"spins" : "true"
},
node analyzer1(hot) _cluster/stats:
"fs" : {
"total_in_bytes" : 16274067427328,
"free_in_bytes" : 12992671711232,
"available_in_bytes" : 12992671711232,
"spins" : "true"
},
on analyzer2(hot) _cluster/stats:
"fs" : {
"total_in_bytes" : 16274067427328,
"free_in_bytes" : 12989881016320,
"available_in_bytes" : 12989881016320,
"spins" : "true"
},
on analyzer3(warm) _cluster:
"fs" : {
"total_in_bytes" : 14753986846720,
"free_in_bytes" : 11355335659520,
"available_in_bytes" : 11355335659520,
"spins" : "true"
},
on analyzer4(warm) _cluster/stats
"fs" : {
"total_in_bytes" : 17874236866560,
"free_in_bytes" : 14598999666688,
"available_in_bytes" : 14598999666688,
"spins" : "true"
},
The five node are in same cluster,and the cluster status is green
Why am I getting different values of fs_total?
note:
elastic version: 5.2.2
jdkversion: openjdk8
I have a few indices in red, after a failure of the system, caused by disk full.
But I cannot reallocate the lost shard. It says "there is no copy of the shard available"
curl -XGET 'localhost:9200/_cluster/allocation/explain?pretty'
{
"shard" : {
"index" : "my_index",
"index_uuid" : "iNY9t81wQf6wJc-KqufUrg",
"id" : 0,
"primary" : true
},
"assigned" : false,
"shard_state_fetch_pending" : false,
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2017-05-30T07:33:04.192Z",
"failed_attempts" : 5,
"delayed" : false,
"details" : "failed to create shard, failure FileSystemException[/data/es/storage/nodes/0/indices/iNY9t81wQf6wJc-KqufUrg/0/_state/state-13.st.tmp: Read-only file system]",
"allocation_status" : "deciders_no"
},
"allocation_delay_in_millis" : 60000,
"remaining_delay_in_millis" : 0,
"nodes" : {
"KvOd2vSQTOSgjgqyEnOKpA" : {
"node_name" : "node1",
"node_attributes" : { },
"store" : {
"shard_copy" : "NONE"
},
"final_decision" : "NO",
"final_explanation" : "there is no copy of the shard available",
"weight" : -3.683333,
"decisions" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has already failed allocating [5] times vs. [5] retries allowed unassigned_info[[reason=ALLOCATION_FAILED], at[2017-05-30T07:33:04.192Z], failed_attempts[5], delayed=false, details[failed to create shard, failure FileSystemException[/data/es/storage/nodes/0/indices/iNY9t81wQf6wJc-KqufUrg/0/_state/state-13.st.tmp: Read-only file system]], allocation_status[deciders_no]] - manually call [/_cluster/reroute?retry_failed=true] to retry"
}
]
},
"pC9fL41xRgeZDAEYvNR1eQ" : {
"node_name" : "node2",
"node_attributes" : { },
"store" : {
"shard_copy" : "AVAILABLE"
},
"final_decision" : "NO",
"final_explanation" : "the shard cannot be assigned because one or more allocation decider returns a 'NO' decision",
"weight" : -2.333333,
"decisions" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has already failed allocating [5] times vs. [5] retries allowed unassigned_info[[reason=ALLOCATION_FAILED], at[2017-05-30T07:33:04.192Z], failed_attempts[5], delayed=false, details[failed to create shard, failure FileSystemException[/data/es/storage/nodes/0/indices/iNY9t81wQf6wJc-KqufUrg/0/_state/state-13.st.tmp: Read-only file system]], allocation_status[deciders_no]] - manually call [/_cluster/reroute?retry_failed=true] to retry"
}
]
},
"1g7eCfEQS9u868lFSoo7FQ" : {
"node_name" : "node3",
"node_attributes" : { },
"store" : {
"shard_copy" : "AVAILABLE"
},
"final_decision" : "NO",
"final_explanation" : "the shard cannot be assigned because one or more allocation decider returns a 'NO' decision",
"weight" : 40.866665,
"decisions" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has already failed allocating [5] times vs. [5] retries allowed unassigned_info[[reason=ALLOCATION_FAILED], at[2017-05-30T07:33:04.192Z], failed_attempts[5], delayed=false, details[failed to create shard, failure FileSystemException[/data/es/storage/nodes/0/indices/iNY9t81wQf6wJc-KqufUrg/0/_state/state-13.st.tmp: Read-only file system]], allocation_status[deciders_no]] - manually call [/_cluster/reroute?retry_failed=true] to retry"
}
]
}
}
}
I tried basically every option of the reroute command (documentation here). but it gives me 400 error.. like this:
curl -XPOST 'localhost:9200/_cluster/reroute?pretty' -H 'Content-Type: application/json' -d'
{
"commands" : [
{
"allocate_replica" : {
"index" : "myindex", "shard" : 0,
"node" : "node2"
}
}
]
}'
response:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "[allocate_replica] trying to allocate a replica shard [myindex][0], while corresponding primary shard is still unassigned"
}
],
"type" : "illegal_argument_exception",
"reason" : "[allocate_replica] trying to allocate a replica shard [myindex][0], while corresponding primary shard is still unassigned"
},
"status" : 400
}
try this:
curl -XPOST 'xx.xxx.xx:9200/_cluster/reroute' -d '{"commands" : [{"allocate_stale_primary":{"index" : "myindex", "shard" : 0, "node" : "node2","accept_data_loss" : true}}]}'
I am attempting to upgrade our Elastic Search cluster from 1.6 to 2.3.4. The upgrade seems to work, and I can see shard allocation starting to happen within Kopf - but at some point the shard allocation appears to stop with many shards left unallocated, and no errors being reported in the logs. Typically I'm left with 1200 / 3800 shards unallocated.
We have a typical 3 node cluster and I am trialing this standalone on my local machine with all 3 nodes running on my local machine.
I have seen similar symptoms reported - see https://t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode.html
. The solution here seemed to be to manually allocate the shards, which I've tried (and works) but I'm at a loss to explain the behaviour of elastic search here. I'd prefer not to go down this route, as I want my cluster to spin up automatically without intervention.
There is also https://github.com/elastic/elasticsearch/pull/14494 which seems to be resolved with the latest ES version, so shouldn't be a problem.
There are no errors in log files - I have upped the root level logging to 'DEBUG' in order to see what I can. What I can see is lines like the below for each unallocated shard (this from the master node logs):
[2016-07-26 09:18:04,859][DEBUG][gateway ] [germany] [index][4] found 0 allocations of [index][4], node[null], [P], v[0], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-07-26T08:05:04.447Z]], highest version: [-1]
[2016-07-26 09:18:04,859][DEBUG][gateway ] [germany] [index][4]: not allocating, number_of_allocated_shards_found [0]
Config file (with comments removed):
cluster.name: elasticsearch-jm-2.3.4
node.name: germany
script.inline: true
script.indexed: true
If I query the cluster health after reallocation has stopped - I get the response below:
http://localhost:9200/_cluster/health?pretty
cluster_name : elasticsearch-jm-2.3.4
status : red
timed_out : False
number_of_nodes : 3
number_of_data_nodes : 3
active_primary_shards : 1289
active_shards : 2578
relocating_shards : 0
initializing_shards : 0
unassigned_shards : 1264
delayed_unassigned_shards : 0
number_of_pending_tasks : 0
number_of_in_flight_fetch : 0
task_max_waiting_in_queue_millis : 0
active_shards_percent_as_number : 67.10046850598647
Further querying for shards - filtered to one index with unallocated shards. As can be seen - shard 0 and 4 are unallocated whereas shard 1 2 and 3 have been allocated :
http://localhost:9200/_cat/shards
cs-payment-warn-2016.07.20 3 p STARTED 106 92.4kb 127.0.0.1 germany
cs-payment-warn-2016.07.20 3 r STARTED 106 92.4kb 127.0.0.1 switzerland
cs-payment-warn-2016.07.20 4 p UNASSIGNED
cs-payment-warn-2016.07.20 4 r UNASSIGNED
cs-payment-warn-2016.07.20 2 r STARTED 120 74.5kb 127.0.0.1 cyprus
cs-payment-warn-2016.07.20 2 p STARTED 120 74.5kb 127.0.0.1 germany
cs-payment-warn-2016.07.20 1 r STARTED 120 73.8kb 127.0.0.1 cyprus
cs-payment-warn-2016.07.20 1 p STARTED 120 73.8kb 127.0.0.1 germany
cs-payment-warn-2016.07.20 0 p UNASSIGNED
cs-payment-warn-2016.07.20 0 r UNASSIGNED
Manually rerouting an unassigned shard appears to work - (stripped back results set)
http://localhost:9200/_cluster/reroute
POST:
{
"dry_run": true,
"commands": [
{
"allocate": {
"index": "cs-payment-warn-2016.07.20",
"shard": 4,
"node": "switzerland" ,
"allow_primary": true
}
}
]
}
Response:
{
"acknowledged" : true,
"state" : {
"version" : 722,
"state_uuid" : "Vw2vPoCMQk2ZosjzviD4TQ",
"master_node" : "yhL7XXy-SKu_WAM-C33dzA",
"blocks" : {},
"nodes" : {},
"routing_table" : {
"indices" : {
"cs-payment-warn-2016.07.20" : {
"shards" : {
"3" : [{
"state" : "STARTED",
"primary" : true,
"node" : "yhL7XXy-SKu_WAM-C33dzA",
"relocating_node" : null,
"shard" : 3,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "x_Iq88hmTqiasrjW09hVuw"
}
}, {
"state" : "STARTED",
"primary" : false,
"node" : "1a8dgBscTUS3c7Pv4mN9CQ",
"relocating_node" : null,
"shard" : 3,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "DF-EUEy_SpeUElnZI6cgsQ"
}
}
],
"4" : [{
"state" : "INITIALIZING",
"primary" : true,
"node" : "1a8dgBscTUS3c7Pv4mN9CQ",
"relocating_node" : null,
"shard" : 4,
"index" : "cs-payment-warn-2016.07.20",
"version" : 1,
"allocation_id" : {
"id" : "1tw7C7YPQsWwm_O-8mYHRg"
},
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2016-07-26T14:20:15.395Z",
"details" : "force allocation from previous reason CLUSTER_RECOVERED, null"
}
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "cs-payment-warn-2016.07.20",
"version" : 1,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}
],
"2" : [{
"state" : "STARTED",
"primary" : false,
"node" : "rlRQ2u0XQRqxWld-wSrOug",
"relocating_node" : null,
"shard" : 2,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "eQ-_vWNbRp27So0iGSitmA"
}
}, {
"state" : "STARTED",
"primary" : true,
"node" : "yhL7XXy-SKu_WAM-C33dzA",
"relocating_node" : null,
"shard" : 2,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "O1PU1_NVS8-uB2yBrG76MA"
}
}
],
"1" : [{
"state" : "STARTED",
"primary" : false,
"node" : "rlRQ2u0XQRqxWld-wSrOug",
"relocating_node" : null,
"shard" : 1,
"index" : "cs-payment-warn-2016.07.20",
"version" : 24,
"allocation_id" : {
"id" : "ZmxtOvorRVmndR15OJMkMA"
}
}, {
"state" : "STARTED",
"primary" : true,
"node" : "yhL7XXy-SKu_WAM-C33dzA",
"relocating_node" : null,
"shard" : 1,
"index" : "cs-payment-warn-2016.07.20",
"version" : 24,
"allocation_id" : {
"id" : "ZNgzePThQxS-iqhRSXzZCw"
}
}
],
"0" : [{
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "cs-payment-warn-2016.07.20",
"version" : 0,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "cs-payment-warn-2016.07.20",
"version" : 0,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}
]
}
}
},
"routing_nodes" : {
"unassigned" : [{
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "cs-payment-warn-2016.07.20",
"version" : 1,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}, {
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "cs-payment-warn-2016.07.20",
"version" : 0,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "cs-payment-warn-2016.07.20",
"version" : 0,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}
]
},
"nodes" : {
"rlRQ2u0XQRqxWld-wSrOug" : [{
"state" : "STARTED",
"primary" : false,
"node" : "rlRQ2u0XQRqxWld-wSrOug",
"relocating_node" : null,
"shard" : 2,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "eQ-_vWNbRp27So0iGSitmA"
}
}, {
"state" : "STARTED",
"primary" : false,
"node" : "rlRQ2u0XQRqxWld-wSrOug",
"relocating_node" : null,
"shard" : 1,
"index" : "cs-payment-warn-2016.07.20",
"version" : 24,
"allocation_id" : {
"id" : "ZmxtOvorRVmndR15OJMkMA"
}
}
]
}
}
}
}