maxtimeout Reached - Indexing a document Elastic Search - elasticsearch

recently when I try to index a document. The request, response with max-timeout reached
after a certain point in time, and it starts indexing again.
Now, I'm trying to find the root cause of that issue. The only thing I'm able to find is one of my master nodes was down at that time. will it result in that timeout issue?
the infra details of my elastic search are:
run in Kubernetes
3 data nodes - each node spec(ram 64GB memory limit 32GB) - heap size - 28GB disk size - 1TB
3 master nodes - each node spec(ram 16GB memory limit 4GB) - heap size - 4GB disk size - 10GB

Found the cause of it
which is due to all masters being down at that time
because of multiple issues:
heap dump saving (out of mem in storage)
because of storage sharing trying to dump heap data with the same name(which throws file exist error)

Related

heap memory overflow master nodes (continuous GC) - Elastic Search

Recently, encountered increases in the heap memory usage in the master nodes (heap memory overflow master nodes continuous garbage collection ). I try to debug the root cause using the heap dump saved in the storage ( sample file name for reference: java_pid1.hprof ) but those files are encrypted unable find anything.
Is this the correct way to debug the heap memory issue,
If yes, how to get the decrypted heap dump to get a proper info
Else how to debug the heap memory issue in the master node
Elastic Search Info:
Running in Kubernetes
Dedicated 3 master nodes
3 data nodes (which are also the ingest nodes)
3 data nodes - each node spec(ram 64GB memory limit 32GB) - heap size - 28GB disk size - 1TB
3 master nodes - each node spec(ram 16GB memory limit 4GB) - heap size - 4GB disk size - 10GB
Hprof files can be opened inside Eclipse. Eclipse has a special plugin to open hprof files. Its called the memory analyzer tool.
I have done these excercises in the past, but usually you find nothing much there.
Thanks.

ElasticSearch High Search response times on new Machine

I have 4 EC2 machines in ElasticSearch Cluster.
Configuration: c5d.large, Memory: 3.5G, Data Disk: 50GB NVME instance storage.
ElasticSearch Version: 6.8.21
I added the 5th machine with the same configuration c5d.large, Memory: 3.5G, Data Disk: 50GB NVME instance storage. After that, Search requests are taking more time than earlier. I enabled slow logs, which shows only shards that are present on the 5th node are taking more time for search. Also, I can see high disk Read IO happening on new node when I trigger search requests. The iowait% increases by the number of search requests and goes up to 90-95%. All old nodes do not show any read spikes.
I checked elasticsearch.yml, jvm.options and even sysctl -A configurations. there is no diff between config on new nodes vs old nodes.
What could be the issue here?

Elastic search kubernetes Sudden rise in data disk usage

Deployed elastic search Kubernetes in GKE. With 2GB memory and 1GB persistence disk.
We got an error out of storage exception. After that, we have Increased to 2GB on the next day itself it reached 2GB, but we haven’t run any big queries. Then again we have increased the persistence disk size to 10 GB. After that, there is no increase in the data persistence disk storage.
On further analysis, we have found total Indices take 20MB of memory unable to what are the data in the disk.
Used elastic search nodes stats API to get the details on disk and node statistics.
I am unable to find the exact reason why memory exceeds and what are the data in the disk. Also, suggest ways to prevent this future.
It is continuously receiving data and based on your config it creates multiple copies of indices and may create a new index daily. Check the config file.
if the elasticsearch cluster fails each time it creates a backup of data so you may need to delete old backups before restarting the cluster.

HDFS Data Write Process for different disk size nodes

We have 10 node HDFS (Hadoop - 2.6, cloudera - 5.8) cluster, and 4 are of disk size - 10 TB and 6 node of disk size - 3TB. In that case, Disk is constantly getting full on small size disk nodes, however the disk is free available on high disk size nodes.
I tried to understand, how namenode writes data/block to different disk size nodes. whether it is equally divided or some percentage of data getting written.
You should look at dfs.datanode.fsdataset.volume.choosing.policy. By default this is set to round-robin but since you have an asymmetric disk setup you should change it to available space.
You can also fine tune disk usage with the other two choosing properties.
For more information see:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/admin_dn_storage_balancing.html

Cassandra compaction taking too much time to complete

Initially we had 12 nodes in Cassandra cluster and with 500GB of data load on each node major compaction use to complete in 20 hours.
Now we have upgraded the cluster to 24 nodes and with same data size that is 500 GB on each node major compaction is taking 5 days.(hardware configuration of each node is exactly same and we are using cassandra-0.8.2 )
So what could be the possible reason for this slowdown?
Is increased cluster size causing this issue?
Compaction is is a completely local operation, so cluster size would not affect it. Request volume would, and so would data volume.

Resources