Elastic Search 2.3.4 Stops allocating shards with no obvious reason - elasticsearch

I am attempting to upgrade our Elastic Search cluster from 1.6 to 2.3.4. The upgrade seems to work, and I can see shard allocation starting to happen within Kopf - but at some point the shard allocation appears to stop with many shards left unallocated, and no errors being reported in the logs. Typically I'm left with 1200 / 3800 shards unallocated.
We have a typical 3 node cluster and I am trialing this standalone on my local machine with all 3 nodes running on my local machine.
I have seen similar symptoms reported - see https://t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode.html
. The solution here seemed to be to manually allocate the shards, which I've tried (and works) but I'm at a loss to explain the behaviour of elastic search here. I'd prefer not to go down this route, as I want my cluster to spin up automatically without intervention.
There is also https://github.com/elastic/elasticsearch/pull/14494 which seems to be resolved with the latest ES version, so shouldn't be a problem.
There are no errors in log files - I have upped the root level logging to 'DEBUG' in order to see what I can. What I can see is lines like the below for each unallocated shard (this from the master node logs):
[2016-07-26 09:18:04,859][DEBUG][gateway ] [germany] [index][4] found 0 allocations of [index][4], node[null], [P], v[0], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-07-26T08:05:04.447Z]], highest version: [-1]
[2016-07-26 09:18:04,859][DEBUG][gateway ] [germany] [index][4]: not allocating, number_of_allocated_shards_found [0]
Config file (with comments removed):
cluster.name: elasticsearch-jm-2.3.4
node.name: germany
script.inline: true
script.indexed: true
If I query the cluster health after reallocation has stopped - I get the response below:
http://localhost:9200/_cluster/health?pretty
cluster_name : elasticsearch-jm-2.3.4
status : red
timed_out : False
number_of_nodes : 3
number_of_data_nodes : 3
active_primary_shards : 1289
active_shards : 2578
relocating_shards : 0
initializing_shards : 0
unassigned_shards : 1264
delayed_unassigned_shards : 0
number_of_pending_tasks : 0
number_of_in_flight_fetch : 0
task_max_waiting_in_queue_millis : 0
active_shards_percent_as_number : 67.10046850598647
Further querying for shards - filtered to one index with unallocated shards. As can be seen - shard 0 and 4 are unallocated whereas shard 1 2 and 3 have been allocated :
http://localhost:9200/_cat/shards
cs-payment-warn-2016.07.20 3 p STARTED 106 92.4kb 127.0.0.1 germany
cs-payment-warn-2016.07.20 3 r STARTED 106 92.4kb 127.0.0.1 switzerland
cs-payment-warn-2016.07.20 4 p UNASSIGNED
cs-payment-warn-2016.07.20 4 r UNASSIGNED
cs-payment-warn-2016.07.20 2 r STARTED 120 74.5kb 127.0.0.1 cyprus
cs-payment-warn-2016.07.20 2 p STARTED 120 74.5kb 127.0.0.1 germany
cs-payment-warn-2016.07.20 1 r STARTED 120 73.8kb 127.0.0.1 cyprus
cs-payment-warn-2016.07.20 1 p STARTED 120 73.8kb 127.0.0.1 germany
cs-payment-warn-2016.07.20 0 p UNASSIGNED
cs-payment-warn-2016.07.20 0 r UNASSIGNED
Manually rerouting an unassigned shard appears to work - (stripped back results set)
http://localhost:9200/_cluster/reroute
POST:
{
"dry_run": true,
"commands": [
{
"allocate": {
"index": "cs-payment-warn-2016.07.20",
"shard": 4,
"node": "switzerland" ,
"allow_primary": true
}
}
]
}
Response:
{
"acknowledged" : true,
"state" : {
"version" : 722,
"state_uuid" : "Vw2vPoCMQk2ZosjzviD4TQ",
"master_node" : "yhL7XXy-SKu_WAM-C33dzA",
"blocks" : {},
"nodes" : {},
"routing_table" : {
"indices" : {
"cs-payment-warn-2016.07.20" : {
"shards" : {
"3" : [{
"state" : "STARTED",
"primary" : true,
"node" : "yhL7XXy-SKu_WAM-C33dzA",
"relocating_node" : null,
"shard" : 3,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "x_Iq88hmTqiasrjW09hVuw"
}
}, {
"state" : "STARTED",
"primary" : false,
"node" : "1a8dgBscTUS3c7Pv4mN9CQ",
"relocating_node" : null,
"shard" : 3,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "DF-EUEy_SpeUElnZI6cgsQ"
}
}
],
"4" : [{
"state" : "INITIALIZING",
"primary" : true,
"node" : "1a8dgBscTUS3c7Pv4mN9CQ",
"relocating_node" : null,
"shard" : 4,
"index" : "cs-payment-warn-2016.07.20",
"version" : 1,
"allocation_id" : {
"id" : "1tw7C7YPQsWwm_O-8mYHRg"
},
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2016-07-26T14:20:15.395Z",
"details" : "force allocation from previous reason CLUSTER_RECOVERED, null"
}
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "cs-payment-warn-2016.07.20",
"version" : 1,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}
],
"2" : [{
"state" : "STARTED",
"primary" : false,
"node" : "rlRQ2u0XQRqxWld-wSrOug",
"relocating_node" : null,
"shard" : 2,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "eQ-_vWNbRp27So0iGSitmA"
}
}, {
"state" : "STARTED",
"primary" : true,
"node" : "yhL7XXy-SKu_WAM-C33dzA",
"relocating_node" : null,
"shard" : 2,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "O1PU1_NVS8-uB2yBrG76MA"
}
}
],
"1" : [{
"state" : "STARTED",
"primary" : false,
"node" : "rlRQ2u0XQRqxWld-wSrOug",
"relocating_node" : null,
"shard" : 1,
"index" : "cs-payment-warn-2016.07.20",
"version" : 24,
"allocation_id" : {
"id" : "ZmxtOvorRVmndR15OJMkMA"
}
}, {
"state" : "STARTED",
"primary" : true,
"node" : "yhL7XXy-SKu_WAM-C33dzA",
"relocating_node" : null,
"shard" : 1,
"index" : "cs-payment-warn-2016.07.20",
"version" : 24,
"allocation_id" : {
"id" : "ZNgzePThQxS-iqhRSXzZCw"
}
}
],
"0" : [{
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "cs-payment-warn-2016.07.20",
"version" : 0,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "cs-payment-warn-2016.07.20",
"version" : 0,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}
]
}
}
},
"routing_nodes" : {
"unassigned" : [{
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "cs-payment-warn-2016.07.20",
"version" : 1,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}, {
"state" : "UNASSIGNED",
"primary" : true,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "cs-payment-warn-2016.07.20",
"version" : 0,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "cs-payment-warn-2016.07.20",
"version" : 0,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2016-07-26T11:24:11.868Z"
}
}
]
},
"nodes" : {
"rlRQ2u0XQRqxWld-wSrOug" : [{
"state" : "STARTED",
"primary" : false,
"node" : "rlRQ2u0XQRqxWld-wSrOug",
"relocating_node" : null,
"shard" : 2,
"index" : "cs-payment-warn-2016.07.20",
"version" : 22,
"allocation_id" : {
"id" : "eQ-_vWNbRp27So0iGSitmA"
}
}, {
"state" : "STARTED",
"primary" : false,
"node" : "rlRQ2u0XQRqxWld-wSrOug",
"relocating_node" : null,
"shard" : 1,
"index" : "cs-payment-warn-2016.07.20",
"version" : 24,
"allocation_id" : {
"id" : "ZmxtOvorRVmndR15OJMkMA"
}
}
]
}
}
}
}

Related

Elasticsearch ILM not rolling

I have configured my ILM to rollover when the indice size be 20GB or after passing 30 days in the hot node
but my indice passed 20GB and still didn't pass to the cold node
and when I run: GET _cat/indices?v I get:
green open packetbeat-7.9.2-2020.10.22-000001 RRAnRZrrRZiihscJ3bymig 10 1 63833049 0 44.1gb 22gb
Could you tell me how to solve that please !
Knowing that in my packetbeat file configuration, I have just changed the number of shards:
setup.template.settings:
index.number_of_shards: 10
index.number_of_replicas: 1
when I run the command GET packetbeat-7.9.2-2020.10.22-000001/_settings I get this output:
{
"packetbeat-7.9.2-2020.10.22-000001" : {
"settings" : {
"index" : {
"lifecycle" : {
"name" : "packetbeat",
"rollover_alias" : "packetbeat-7.9.2"
},
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"mapping" : {
"total_fields" : {
"limit" : "10000"
}
},
"refresh_interval" : "5s",
"number_of_shards" : "10",
"provided_name" : "<packetbeat-7.9.2-{now/d}-000001>",
"max_docvalue_fields_search" : "200",
"query" : {
"default_field" : [
"message",
"tags",
"agent.ephemeral_id",
"agent.id",
"agent.name",
"agent.type",
"agent.version",
"as.organization.name",
"client.address",
"client.as.organization.name",
and the output of the command GET /packetbeat-7.9.2-2020.10.22-000001/_ilm/explain is :
{
"indices" : {
"packetbeat-7.9.2-2020.10.22-000001" : {
"index" : "packetbeat-7.9.2-2020.10.22-000001",
"managed" : true,
"policy" : "packetbeat",
"lifecycle_date_millis" : 1603359683835,
"age" : "15.04d",
"phase" : "hot",
"phase_time_millis" : 1603359684332,
"action" : "rollover",
"action_time_millis" : 1603360173138,
"step" : "check-rollover-ready",
"step_time_millis" : 1603360173138,
"phase_execution" : {
"policy" : "packetbeat",
"phase_definition" : {
"min_age" : "0ms",
"actions" : {
"rollover" : {
"max_size" : "50gb",
"max_age" : "30d"
}
}
},
"version" : 1,
"modified_date_in_millis" : 1603359683339
}
}
}
}
It's weird that it's 50GB !!
Thanks for your help
So I found the solution of this problem.
After updating the policy, I removed the policy from the index using it, and then added it again to those index.

MongoDB hangs randomly

Overview
I have a ruby application that uses MongoDB as a database. While running tests for this application I am creating collections and indexes for every test case using Minitest.
The test environment is created using docker compose where one container is running the tests and the other container is running MongoDB.
Problem
When running the tests for the first time, after a while MongoDB gets stuck. Any request to query the collections doesn't respond.
I was able to connect to it before the tests started running using the command line client. When I checked the state of the server using db.serverStatus() I see that some operations have acquired locks. Looking at the globalLock field I understand that 1 operation has write lock and there are 2 operations are waiting to acquire read lock.
I am unable to understand why would these operations hang and not yield the locks. I have no idea how to debug this problem further.
MongoDB Version: 3.6.13
Ruby Driver version: 2.8.0
I've also tried other versions 3.6.x and 4.0
Any help or direction is highly appreciated.
db.serverStatus output
{
"host" : "c658c885eb90",
"version" : "3.6.14",
"process" : "mongod",
"pid" : NumberLong(1),
"uptime" : 98,
"uptimeMillis" : NumberLong(97909),
"uptimeEstimate" : NumberLong(97),
"localTime" : ISODate("2019-11-03T16:09:14.289Z"),
"asserts" : {
"regular" : 0,
"warning" : 0,
"msg" : 0,
"user" : 0,
"rollovers" : 0
},
"connections" : {
"current" : 6,
"available" : 838854,
"totalCreated" : 11
},
"extra_info" : {
"note" : "fields vary by platform",
"page_faults" : 0
},
"globalLock" : {
"totalTime" : NumberLong(97908000),
"currentQueue" : {
"total" : 2,
"readers" : 2,
"writers" : 0
},
"activeClients" : {
"total" : 13,
"readers" : 0,
"writers" : 1
}
},
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(14528),
"w" : NumberLong(12477),
"W" : NumberLong(5)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(1020),
"w" : NumberLong(14459),
"R" : NumberLong(3),
"W" : NumberLong(6599)
},
"acquireWaitCount" : {
"r" : NumberLong(2)
},
"timeAcquiringMicros" : {
"r" : NumberLong(76077321)
}
},
"Collection" : {
"acquireCount" : {
"R" : NumberLong(1018),
"W" : NumberLong(8805)
}
},
"Metadata" : {
"acquireCount" : {
"W" : NumberLong(37)
}
}
},
"logicalSessionRecordCache" : {
"activeSessionsCount" : 3,
"sessionsCollectionJobCount" : 1,
"lastSessionsCollectionJobDurationMillis" : 0,
"lastSessionsCollectionJobTimestamp" : ISODate("2019-11-03T16:07:36.407Z"),
"lastSessionsCollectionJobEntriesRefreshed" : 0,
"lastSessionsCollectionJobEntriesEnded" : 0,
"lastSessionsCollectionJobCursorsClosed" : 0,
"transactionReaperJobCount" : 0,
"lastTransactionReaperJobDurationMillis" : 0,
"lastTransactionReaperJobTimestamp" : ISODate("2019-11-03T16:07:36.407Z"),
"lastTransactionReaperJobEntriesCleanedUp" : 0
},
"network" : {
"bytesIn" : NumberLong(1682811),
"bytesOut" : NumberLong(1019834),
"physicalBytesIn" : NumberLong(1682811),
"physicalBytesOut" : NumberLong(1019834),
"numRequests" : NumberLong(7822),
"compression" : {
"snappy" : {
"compressor" : {
"bytesIn" : NumberLong(0),
"bytesOut" : NumberLong(0)
},
"decompressor" : {
"bytesIn" : NumberLong(0),
"bytesOut" : NumberLong(0)
}
}
},
"serviceExecutorTaskStats" : {
"executor" : "passthrough",
"threadsRunning" : 6
}
},
"opLatencies" : {
"reads" : {
"latency" : NumberLong(61374),
"ops" : NumberLong(963)
},
"writes" : {
"latency" : NumberLong(13074),
"ops" : NumberLong(286)
},
"commands" : {
"latency" : NumberLong(988232),
"ops" : NumberLong(6570)
}
},
"opReadConcernCounters" : {
"available" : NumberLong(0),
"linearizable" : NumberLong(0),
"local" : NumberLong(0),
"majority" : NumberLong(0),
"none" : NumberLong(944)
},
"opcounters" : {
"insert" : 246,
"query" : 944,
"update" : 40,
"delete" : 0,
"getmore" : 0,
"command" : 6595
},
"opcountersRepl" : {
"insert" : 0,
"query" : 0,
"update" : 0,
"delete" : 0,
"getmore" : 0,
"command" : 0
},
"storageEngine" : {
"name" : "ephemeralForTest",
"supportsCommittedReads" : false,
"readOnly" : false,
"persistent" : false
},
"tcmalloc" : {
"generic" : {
"current_allocated_bytes" : 8203504,
"heap_size" : 12496896
},
"tcmalloc" : {
"pageheap_free_bytes" : 2760704,
"pageheap_unmapped_bytes" : 0,
"max_total_thread_cache_bytes" : 516947968,
"current_total_thread_cache_bytes" : 1007120,
"total_free_bytes" : 1532688,
"central_cache_free_bytes" : 231040,
"transfer_cache_free_bytes" : 294528,
"thread_cache_free_bytes" : 1007120,
"aggressive_memory_decommit" : 0,
"pageheap_committed_bytes" : 12496896,
"pageheap_scavenge_count" : 0,
"pageheap_commit_count" : 9,
"pageheap_total_commit_bytes" : 12496896,
"pageheap_decommit_count" : 0,
"pageheap_total_decommit_bytes" : 0,
"pageheap_reserve_count" : 9,
"pageheap_total_reserve_bytes" : 12496896,
"spinlock_total_delay_ns" : 0,
"formattedString" : "------------------------------------------------\nMALLOC: 8204080 ( 7.8 MiB) Bytes in use by application\nMALLOC: + 2760704 ( 2.6 MiB) Bytes in page heap freelist\nMALLOC: + 231040 ( 0.2 MiB) Bytes in central cache freelist\nMALLOC: + 294528 ( 0.3 MiB) Bytes in transfer cache freelist\nMALLOC: + 1006544 ( 1.0 MiB) Bytes in thread cache freelists\nMALLOC: + 1204480 ( 1.1 MiB) Bytes in malloc metadata\nMALLOC: ------------\nMALLOC: = 13701376 ( 13.1 MiB) Actual memory used (physical + swap)\nMALLOC: + 0 ( 0.0 MiB) Bytes released to OS (aka unmapped)\nMALLOC: ------------\nMALLOC: = 13701376 ( 13.1 MiB) Virtual address space used\nMALLOC:\nMALLOC: 415 Spans in use\nMALLOC: 18 Thread heaps in use\nMALLOC: 4096 Tcmalloc page size\n------------------------------------------------\nCall ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\nBytes released to the OS take up virtual address space but no physical memory.\n"
}
},
"transactions" : {
"retriedCommandsCount" : NumberLong(0),
"retriedStatementsCount" : NumberLong(0),
"transactionsCollectionWriteCount" : NumberLong(0)
},
"transportSecurity" : {
"1.0" : NumberLong(0),
"1.1" : NumberLong(0),
"1.2" : NumberLong(0),
"1.3" : NumberLong(0),
"unknown" : NumberLong(0)
},
"mem" : {
"bits" : 64,
"resident" : 41,
"virtual" : 836,
"supported" : true,
"mapped" : 0
},
"metrics" : {
"commands" : {
"buildInfo" : {
"failed" : NumberLong(0),
"total" : NumberLong(2)
},
"count" : {
"failed" : NumberLong(0),
"total" : NumberLong(21)
},
"createIndexes" : {
"failed" : NumberLong(0),
"total" : NumberLong(5656)
},
"drop" : {
"failed" : NumberLong(0),
"total" : NumberLong(784)
},
"dropIndexes" : {
"failed" : NumberLong(87),
"total" : NumberLong(87)
},
"find" : {
"failed" : NumberLong(0),
"total" : NumberLong(944)
},
"getLog" : {
"failed" : NumberLong(0),
"total" : NumberLong(1)
},
"insert" : {
"failed" : NumberLong(0),
"total" : NumberLong(246)
},
"isMaster" : {
"failed" : NumberLong(0),
"total" : NumberLong(38)
},
"listCollections" : {
"failed" : NumberLong(0),
"total" : NumberLong(1)
},
"listIndexes" : {
"failed" : NumberLong(1),
"total" : NumberLong(1)
},
"replSetGetStatus" : {
"failed" : NumberLong(1),
"total" : NumberLong(1)
},
"serverStatus" : {
"failed" : NumberLong(0),
"total" : NumberLong(2)
},
"update" : {
"failed" : NumberLong(0),
"total" : NumberLong(40)
},
"whatsmyuri" : {
"failed" : NumberLong(0),
"total" : NumberLong(1)
}
},
"cursor" : {
"timedOut" : NumberLong(0),
"open" : {
"noTimeout" : NumberLong(0),
"pinned" : NumberLong(0),
"total" : NumberLong(0)
}
},
"document" : {
"deleted" : NumberLong(0),
"inserted" : NumberLong(246),
"returned" : NumberLong(398),
"updated" : NumberLong(40)
},
"getLastError" : {
"wtime" : {
"num" : 0,
"totalMillis" : 0
},
"wtimeouts" : NumberLong(0)
},
"operation" : {
"scanAndOrder" : NumberLong(0),
"writeConflicts" : NumberLong(0)
},
"query" : {
"updateOneOpStyleBroadcastWithExactIDCount" : NumberLong(0),
"upsertReplacementCannotTargetByQueryCount" : NumberLong(0)
},
"queryExecutor" : {
"scanned" : NumberLong(435),
"scannedObjects" : NumberLong(438)
},
"record" : {
"moves" : NumberLong(0)
},
"repl" : {
"executor" : {
"pool" : {
"inProgressCount" : 0
},
"queues" : {
"networkInProgress" : 0,
"sleepers" : 0
},
"unsignaledEvents" : 0,
"shuttingDown" : false,
"networkInterface" : "\nNetworkInterfaceASIO Operations' Diagnostic:\nOperation: Count: \nConnecting 0 \nIn Progress 0 \nSucceeded 0 \nCanceled 0 \nFailed 0 \nTimed Out 0 \n\n"
},
"apply" : {
"attemptsToBecomeSecondary" : NumberLong(0),
"batchSize" : NumberLong(0),
"batches" : {
"num" : 0,
"totalMillis" : 0
},
"ops" : NumberLong(0)
},
"buffer" : {
"count" : NumberLong(0),
"maxSizeBytes" : NumberLong(0),
"sizeBytes" : NumberLong(0)
},
"initialSync" : {
"completed" : NumberLong(0),
"failedAttempts" : NumberLong(0),
"failures" : NumberLong(0)
},
"network" : {
"bytes" : NumberLong(0),
"getmores" : {
"num" : 0,
"totalMillis" : 0
},
"ops" : NumberLong(0),
"readersCreated" : NumberLong(0)
},
"preload" : {
"docs" : {
"num" : 0,
"totalMillis" : 0
},
"indexes" : {
"num" : 0,
"totalMillis" : 0
}
}
},
"storage" : {
"freelist" : {
"search" : {
"bucketExhausted" : NumberLong(0),
"requests" : NumberLong(0),
"scanned" : NumberLong(0)
}
}
},
"ttl" : {
"deletedDocuments" : NumberLong(0),
"passes" : NumberLong(1)
}
},
"ok" : 1
}

Number of records processed in logstash

We're using logstash to sync Elastic search and we've around 3 million documents. It takes 3 to 4 hours to sync. Currently all we get is, it is started and stopped. Is there any way to see how many records processed in logstash ?
If you're using Logstash 5 and higher, the Logstash Monitoring API can help you. You can see and monitor what's happening inside Logstash as it processes events. If you hit the Pipeline stats API you'll get the total number of processed events per stage and plugin (input/filter/output):
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'
You'll get this type of response in which you can clearly see at any time how many events have been processed:
{
"pipelines" : {
"test" : {
"events" : {
"duration_in_millis" : 365495,
"in" : 216485,
"filtered" : 216485,
"out" : 216485,
"queue_push_duration_in_millis" : 342466
},
"plugins" : {
"inputs" : [ {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-1",
"events" : {
"out" : 216485,
"queue_push_duration_in_millis" : 342466
},
"name" : "beats"
} ],
"filters" : [ {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-2",
"events" : {
"duration_in_millis" : 55969,
"in" : 216485,
"out" : 216485
},
"failures" : 216485,
"patterns_per_field" : {
"message" : 1
},
"name" : "grok"
}, {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-3",
"events" : {
"duration_in_millis" : 3326,
"in" : 216485,
"out" : 216485
},
"name" : "geoip"
} ],
"outputs" : [ {
"id" : "35131f351e2dc5ed13ee04265a8a5a1f95292165-4",
"events" : {
"duration_in_millis" : 278557,
"in" : 216485,
"out" : 216485
},
"name" : "elasticsearch"
} ]
},
"reloads" : {
"last_error" : null,
"successes" : 0,
"last_success_timestamp" : null,
"last_failure_timestamp" : null,
"failures" : 0
},
"queue" : {
"type" : "memory"
}
}
}

elasticsearch - there is no copy of the shard available?

I have a few indices in red, after a failure of the system, caused by disk full.
But I cannot reallocate the lost shard. It says "there is no copy of the shard available"
curl -XGET 'localhost:9200/_cluster/allocation/explain?pretty'
{
"shard" : {
"index" : "my_index",
"index_uuid" : "iNY9t81wQf6wJc-KqufUrg",
"id" : 0,
"primary" : true
},
"assigned" : false,
"shard_state_fetch_pending" : false,
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2017-05-30T07:33:04.192Z",
"failed_attempts" : 5,
"delayed" : false,
"details" : "failed to create shard, failure FileSystemException[/data/es/storage/nodes/0/indices/iNY9t81wQf6wJc-KqufUrg/0/_state/state-13.st.tmp: Read-only file system]",
"allocation_status" : "deciders_no"
},
"allocation_delay_in_millis" : 60000,
"remaining_delay_in_millis" : 0,
"nodes" : {
"KvOd2vSQTOSgjgqyEnOKpA" : {
"node_name" : "node1",
"node_attributes" : { },
"store" : {
"shard_copy" : "NONE"
},
"final_decision" : "NO",
"final_explanation" : "there is no copy of the shard available",
"weight" : -3.683333,
"decisions" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has already failed allocating [5] times vs. [5] retries allowed unassigned_info[[reason=ALLOCATION_FAILED], at[2017-05-30T07:33:04.192Z], failed_attempts[5], delayed=false, details[failed to create shard, failure FileSystemException[/data/es/storage/nodes/0/indices/iNY9t81wQf6wJc-KqufUrg/0/_state/state-13.st.tmp: Read-only file system]], allocation_status[deciders_no]] - manually call [/_cluster/reroute?retry_failed=true] to retry"
}
]
},
"pC9fL41xRgeZDAEYvNR1eQ" : {
"node_name" : "node2",
"node_attributes" : { },
"store" : {
"shard_copy" : "AVAILABLE"
},
"final_decision" : "NO",
"final_explanation" : "the shard cannot be assigned because one or more allocation decider returns a 'NO' decision",
"weight" : -2.333333,
"decisions" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has already failed allocating [5] times vs. [5] retries allowed unassigned_info[[reason=ALLOCATION_FAILED], at[2017-05-30T07:33:04.192Z], failed_attempts[5], delayed=false, details[failed to create shard, failure FileSystemException[/data/es/storage/nodes/0/indices/iNY9t81wQf6wJc-KqufUrg/0/_state/state-13.st.tmp: Read-only file system]], allocation_status[deciders_no]] - manually call [/_cluster/reroute?retry_failed=true] to retry"
}
]
},
"1g7eCfEQS9u868lFSoo7FQ" : {
"node_name" : "node3",
"node_attributes" : { },
"store" : {
"shard_copy" : "AVAILABLE"
},
"final_decision" : "NO",
"final_explanation" : "the shard cannot be assigned because one or more allocation decider returns a 'NO' decision",
"weight" : 40.866665,
"decisions" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has already failed allocating [5] times vs. [5] retries allowed unassigned_info[[reason=ALLOCATION_FAILED], at[2017-05-30T07:33:04.192Z], failed_attempts[5], delayed=false, details[failed to create shard, failure FileSystemException[/data/es/storage/nodes/0/indices/iNY9t81wQf6wJc-KqufUrg/0/_state/state-13.st.tmp: Read-only file system]], allocation_status[deciders_no]] - manually call [/_cluster/reroute?retry_failed=true] to retry"
}
]
}
}
}
I tried basically every option of the reroute command (documentation here). but it gives me 400 error.. like this:
curl -XPOST 'localhost:9200/_cluster/reroute?pretty' -H 'Content-Type: application/json' -d'
{
"commands" : [
{
"allocate_replica" : {
"index" : "myindex", "shard" : 0,
"node" : "node2"
}
}
]
}'
response:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "[allocate_replica] trying to allocate a replica shard [myindex][0], while corresponding primary shard is still unassigned"
}
],
"type" : "illegal_argument_exception",
"reason" : "[allocate_replica] trying to allocate a replica shard [myindex][0], while corresponding primary shard is still unassigned"
},
"status" : 400
}
try this:
curl -XPOST 'xx.xxx.xx:9200/_cluster/reroute' -d '{"commands" : [{"allocate_stale_primary":{"index" : "myindex", "shard" : 0, "node" : "node2","accept_data_loss" : true}}]}'

elasticsearch doesn't update documents

I'm facing up with a trouble related with document updatings.
I'm able to index(create) documents and they are correctly added on index.
Nevertheless, when I'm trying to update one of them, the operation is not made, the document is not updated.
When I first time add the document it's like:
{
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
This is the code I'm using in order to build UpdateRequest:
this.elasticsearchResources.getElasticsearchClient()
.prepareUpdate()
.setIndex(this.user.getMe().getUser())
.setType(type)
.setId(id.toString())
.setDoc(source)
.setUpsert(source)
.setDetectNoop(true);
I've also been able to debug which's the content of this request begore sending it to elasticsearch. The document is:
{
"user":"user4",
"timestamp":"2016-12-16T15:00:22.645Z",
"startTimestamp":"2016-12-16T15:00:22.645Z",
"dueTimestamp":null,
"closingTimestamp":null,
"matter":"F1",
"comment":null,
"status":0,
"backlogStatus":20,
"metainfos":{
},
"resources":[
],
"notes":null
}
As you can see the only difference is metainfos is empty when I try to update the document.
After having performed this update request the document is not updated. I mean the content of metainfos keeps as before:
#curl -XGET 'http://localhost:9200/user4/fuas/_search?pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "living_v1",
"_type" : "fuas",
"_id" : "327c9435-c394-11e6-aa90-02420a011808",
"_score" : 1.0,
"_routing" : "user4",
"_source" : {
"user" : "user4",
"timestamp" : "2016-12-16T15:00:22.645Z",
"startTimestamp" : "2016-12-16T15:00:22.645Z",
"dueTimestamp" : null,
"closingTimestamp" : null,
"matter" : "F1",
"comment" : null,
"status" : 0,
"backlogStatus" : 20,
"metainfos" : {
>>>>>>>> "ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa" : [ "FZ11" ]
},
"resources" : [ ],
"notes" : null
}
} ]
}
}
I don't quite figure out what's wrong. Any ideas?
ElasticSearch will not update an empty object. You can try with:
null "metainfos":null
or
"metainfos":"ceeaceaaaceeaceaaaceeaceaaaceeaaceaaaceeabceaaa":[]
to clean the field.

Resources