I have 11 elasticsearch nodes 3 master node 6 data node and 2 coordinate node.We are running latest version of elasticsearch 7.13.2
we have installed metricbeat and configured in all elasticsearch node we are monitoring our ELK stack and we have observed that .monitoring-es-* indices has 200gb ,100,150gb and .monitoring-logstash-* has less amount of data size same with the kibana
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .monitoring-es-7-mb-2021.07.15 NPdkPbofRde5YWd50oCzAA 1 1 95287036 0 141.2gb 70.6gb
green open .monitoring-es-7-mb-2021.07.16 F2oy_3WVRY6tSdhaMp7ZEg 1 1 16711910 0 25.1gb 12.4gb
green open .monitoring-es-7-mb-2021.07.11 d1JChmtgTGmnFoORnIcA1Q 1 1 93133543 0 135.9gb 67.9gb
green open .monitoring-es-7-mb-2021.07.12 MYu5ozjiQGGjGFBI5fjcvQ 1 1 94136537 0 137.9gb 68.9gb
green open .monitoring-es-7-mb-2021.07.13 7eLRyUWgTS-dSFE3ad669A 1 1 95323641 0 139.9gb 69.9gb
green open .monitoring-es-7-mb-2021.07.14 w2RB_A1TS1SeUBebLUURkA 1 1 95287470 0 140.7gb 70.3gb
green open .monitoring-es-7-mb-2021.07.10 llAWKQJwQ_-2FZg4Dbc3iA 1 1 92770558 0 135gb 67.5gb
we have enable elasticsearch-xpack module in metricbeat
elasticsearch-xpack.yml
- module: elasticsearch
xpack.enabled: true
period: 10s
metricsets:
- cluster_stats
- index
- index_recovery
- index_summary
- node
- node_stats
- pending_tasks
- shard
hosts:
- "https://xx.xx.xx.xx:9200" #em1
- "https://xx.xx.xx.xx:9200" #em2
- "https://xx.xx.xx.xx:9200" #em3
- "https://xx.xx.xx.xx:9200" #ec1
- "https://xx.xx.xx.xx:9200" #ec2
- "https://xx.xx.xx.xx:9200" #ed1
- "https://xx.xx.xx.xx:9200" #ed2
- "https://xx.xx.xx.xx:9200" #ed3
- "https://xx.xx.xx.xx:9200" #ed4
- "https://xx.xx.xx.xx:9200" #ed5
- "https://xx.xx.xx.xx:9200" #ed6
scope: cluster
ssl.certificate_authorities: ["/etc/elasticsearch/certs/ca/ca.crt"]
username: "xxxx"
password: "********"
How can i reduce its size or which metricset should i monitor Is this normal behaviour?
Related
A few days ago I was testing kibana and exploring its characteristics. When I entered http://localhost/_cat/indices?v I saw some strange indexes and when I deleted them, they were recreated automatically.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .monitoring-es-6-2018.11.17 23zMdQcuSeeZRmQ8yzD-Qg 1 0 86420 192 33.9mb 33.9mb
green open .monitoring-es-6-2018.11.16 Dn7WCVBUTZSSaBlKKy8hxA 1 0 12283 69 5.8mb 5.8mb
green open .monitoring-es-6-2018.11.18 YaFbgQIiTVGZ1kjOB_wWpA 1 0 95069 250 36.6mb 36.6mb
green open .monitoring-es-6-2018.11.19 3bvTjlk0SNy2UR21C1muVA 1 0 62104 208 32.4mb 32.4mb
green open .monitoring-kibana-6-2018.11.16 MXwi2p83S46tEglvViIUUQ 1 0 12 0 32.6kb 32.6kb
green open .kibana MZXJrrajQvqAL9h1rKuxWg 1 0 1 0 4kb 4kb
... my other indexes
How to prevent these indices from being created?
You can disable monitoring in your elasticsearch.yml configuration file:
xpack.monitoring.enabled: false
The same thing in kibana.yml:
xpack.monitoring.enabled: false
I am running a 2 node cluster on version 5.6.12
I followed the following rolling upgrade guide: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/rolling-upgrades.html
After reconnecting the last upgraded node back into my cluster, the health status remained as yellow due to unassigned shards.
Re-enabling shard allocation seemed to have no effect:
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "all"
}
}
My query results when checking cluster health:
GET _cat/health:
1541522454 16:40:54 elastic-upgrade-test yellow 2 2 84 84 0 0 84 0 - 50.0%
GET _cat/shards:
v2_session-prod-2018.11.05 3 p STARTED 6000 1016kb xx.xxx.xx.xxx node-25
v2_session-prod-2018.11.05 3 r UNASSIGNED
v2_session-prod-2018.11.05 1 p STARTED 6000 963.3kb xx.xxx.xx.xxx node-25
v2_session-prod-2018.11.05 1 r UNASSIGNED
v2_session-prod-2018.11.05 4 p STARTED 6000 1020.4kb xx.xxx.xx.xxx node-25
v2_session-prod-2018.11.05 4 r UNASSIGNED
v2_session-prod-2018.11.05 2 p STARTED 6000 951.4kb xx.xxx.xx.xxx node-25
v2_session-prod-2018.11.05 2 r UNASSIGNED
v2_session-prod-2018.11.05 0 p STARTED 6000 972.2kb xx.xxx.xx.xxx node-25
v2_session-prod-2018.11.05 0 r UNASSIGNED
v2_status-prod-2018.11.05 3 p STARTED 6000 910.2kb xx.xxx.xx.xxx node-25
v2_status-prod-2018.11.05 3 r UNASSIGNED
Is there another way to try and get shards allocation working again so I can get my cluster health back to green?
The other node within my cluster had a "high disk watermark [90%] exceeded" warning message so shards were "relocated away from this node".
I updated the config to:
cluster.routing.allocation.disk.watermark.high: 95%
After restarting the node, shards began to allocate again.
This is a quick fix - I will also attempt to increase the disk space on this node to ensure I don't lose reliability.
I've 3 node cluster with replication 2 and the replicated table stats.
Recently saw that there is a delay on the replica db using /replica_satatus
db.stats: Absolute delay: 0. Relative delay: 0.
db2.stats: Absolute delay: 912916. Relative delay: 912916.
Here is data from system.replication_queue
Row 1:
──────
database: db2
table: stats
replica_name: replica_2
position: 3
node_name: queue-0001743101
type: GET_PART
create_time: 2018-06-19 20:57:42
required_quorum: 0
source_replica: replica_1
new_part_name: 20180619_20180619_823572_823572_0
parts_to_merge: []
is_detach: 0
is_currently_executing: 0
num_tries: 917943
last_exception:
last_attempt_time: 2018-06-29 15:32:50
num_postponed: 118617
postpone_reason:
last_postpone_time: 2018-06-29 15:32:23
Row 2:
──────
database: db2
table: stats
replica_name: replica_2
position: 4
node_name: queue-0001743103
type: MERGE_PARTS
create_time: 2018-06-19 20:57:48
required_quorum: 0
source_replica: replica_1
new_part_name: 20180619_20180619_823568_823573_1
parts_to_merge: ['20180619_20180619_823568_823568_0','20180619_20180619_823569_823569_0','20180619_20180619_823570_823570_0','20180619_20180619_823571_823571_0','20180619_20180619_823572_823572_0','20180619_20180619_823573_823573_0']
is_detach: 0
is_currently_executing: 0
num_tries: 917943
last_exception: Code: 234, e.displayText() = DB::Exception: No active replica has part 20180619_20180619_823568_823573_1 or covering part, e.what() = DB::Exception
last_attempt_time: 2018-06-29 15:32:50
num_postponed: 199384
postpone_reason: Not merging into part 20180619_20180619_823568_823573_1 because part 20180619_20180619_823572_823572_0 is not ready yet (log entry for that part is being processed).
last_postpone_time: 2018-06-29 15:32:35
Any clue how to deal with it?.
Should I detach broken replika partition and attach it again?
Stop all inserts to this cluster, it should auto clear the replication queue.
I'm very new to YAML and what to parse a file I have made and verified is of correct structure. Here is the YAML:
---
DocType: DAQ Configuration File
# Creation Date: Wed Sep 21 10:34:06 2016
File Settings:
- Configuration Path: /mnt/sigma/daqengine/config/
- Configuration Name: daqengine.yaml
- Build File Path: ./bld
- Log File Path: ./log
- Record H5 Files: true
Engine Data:
- Make: EOS
- Model: M280
- Number of DAQs: 1
DAQ01 Settings:
- IP Address: 192.168.116.223
- Log DAQ Internal Temps: true
- T0 Temp. Limit: 55
- T0 Clear OTC Level Temp.: 50
- T0 Sample Averaging: 5
- T1 Temp. Limit: 60
- T1 Clear OTC Level Temp.: 55
- T1 Sample Averaging: 2
- No. Layers Used: 1
DAQ01 Layer 0 Settings:
- Model: AI-218
- Sample Rate: 50000
- Channels Used: 8
- Channel01: HF Temp
- Channel02: On-Axis PD
- Channel03: Off-Axis PD
- Channel04: X-Position
- Channel05: Y-Position
- Channel06: LDS
- Channel07: Laser Power
- Channel08: Unused
...
I need to extract the sequenced values to each key it is tagged to in C++.
I am having problems when I am looking for a register inside of an index and the message is the next:
TransportError: TransportError(503, u'NoShardAvailableActionException[[new_gompute_history][2] null]; nested: IllegalIndexShardStateException[[new_gompute_history][2] CurrentState[POST_RECOVERY] operations only allowed when started/relocated]; ')
This comes when I am searching by an id inside of an index.
The health of my cluster is green:
GET _cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign
1438678496 10:54:56 juan green 5 4 212 106 0 0 0
GET _cat/allocation?v
shards disk.used disk.avail disk.total disk.percent host ip node
53 3.1gb 16.8gb 20gb 15 bc10-05 10.8.5.15 Anomaloco
53 6.4gb 80.8gb 87.3gb 7 bc10-03 10.8.5.13 Algrim the Strong
0 0b l8a 10.8.0.231 logstash-l8a-5920-4018
53 6.4gb 80.8gb 87.3gb 7 bc10-03 10.8.5.13 Harry Leland
53 3.1gb 16.8gb 20gb 15 bc10-05 10.8.5.15 Hypnotia
I solved putting a a sleep time between consecutive PUTs, but I do not like this solution