How can i save disk space in elasticsearch

How can i save disk space in elasticsearch - elasticsearch

I have a three nodes cluster each of 114 GB disk capacity. I am pushing syslog events from around 190 devices to this server. The events flow at a rate of 100k withing 15 mins. The index has 5 shards and uses best_compression method. Inspite of this the disk gets filled up soon so i was forced to remove the replica for this index. The index size is 170 GB and each shard is of size 34.1 GB. Now if i get additional disk space and i try to re index this data to a new index with 3 shards and replica will it save disk space ?

Related

How much heap memory required for elasticsearch?

I have 8 node cluster(3Master+3Data+2Coordinate) in each data node the heap size is 10gb and disk space is 441gb each total 1.2TB
in each day i have 32.73GB data in each day 26 shards created for 11 indices.So lets suppose the retention period is 30 days.In 30th day the data on cluster would be 982GB and total shards would be 780 each node gets 260 shards.So the average shard size would be 260mb(approx).i read this documentation that a node of 30gb heap size can handle 600 shards.So the question is Can heap size of 10gb can handle 260 shards ?.

This article which you read can be considered a good general recommendations but there are various factors which can affect it, like size of indices, size of shards, size of documents, type of disk, current load on the system and so on, in the same document you can notice the shard size recommendation is between 10 to 50 GB, while you have very small shard size(260 MB as you mentioned), so based on this, I can say 10GB heap can easily handle 260 shards in your case, although you should benchmark your cluster and read more about how ES internally stores the data and searches them so that its easy for you to fine-tune it.

Elasticsearch Memory Usage increases over time and is at 100%

I see that indexing performance degraded over a period of time in Elasticsearch. I see that the mem usage has slowly increased over a period of time until it became 100%. At this state I cannot index any more data. I have default shard settings - 5 primary and 1 replica. My index is time based with index created every hour to store coral service logs of various teams. An index size corresponds to about 3GB with 5 shards and with replica it is about 6GB. With a single shard and 0 replicas it comes to about 1.7 GB.
I am using ec2's i2.2x large hosts which offer 1.6TB space and 61GB RAM and 8 cores.
I have set the heap size to 30GB.
Following is node statistics:
https://jpst.it/1eznd
Could you please help in fixing this? My whole cluster came down that I had to delete all the indices.

Elasticsearch sizing of shards

I am new to elasticsearch.Suppose we have a two node cluster and have a config of 2 primary shards and one replica for our single index.So node 1 has P0,R1 and node 2 has P1,R0. Now suppose later on I reduce the number of replicas to 0.Then will the shards P0 and P1 automatically resize themselves to occupy the disk space vacated by replicas and allow me greater disk space for indexing then previously when I had replicas.

A replica shard takes more or less the same space as its primary since both contain the same documents. So, say, you have indexed 1 million documents in your index, then each primary shard contains more or less half that amount of documents, i.e. 500K document and each replica contains the same number of documents as well.
If each document weighs 1KB, then:
The primary shard P0 has 500K document weighing 500MB
The replica shard R0 has 500K document weighing 500MB
The primary shard P1 has 500K document weighing 500MB
The replica shard R1 has 500K document weighing 500MB
Which means that your index occupies 2GB of disk space on your node. If you later reduce the number of replicas to 0, then that will free up 1GB of space that your primary shards will be able to occupy, indeed.
However, note that by doing so, you certainly gain disk space, but you won't have any redundancy anymore and you will not be able to spread your index over two nodes, which is the main idea behind replicas to begin with.
The other thing is that the size of a shard is bounded by a physical limit that it will not be able to cross. That limit is dependent on many factors, among which the amount of heap and the total physical memory you have. If you have 2GB of heap and 50GB of disk space, you cannot expect to index 50GB of data into your index, that won't work, or will be very slow and unstable.
=> So the disk space only should not be the main driver for sizing your shards. Having enough disk space is necessary condition but not a sufficient one, you also need to look at the RAM and the heap allocated to your ES node.

ElasticSearch - Dynamic capacity

i have three node elastic search cluster having 250 GB disk space on each one of them, I have three shards, one on each of them. If i run out of disk capacity and add another ( Fourth) node with 500 GB disk will ElasticSearch cluster would move one of shard to avail of higher disk on fourth node ?

ElasticSearch Search query performance issue when index size is growing

I'm experiencing search performance degradation when index size is growing. Eg: For a given query, i'm getting 3500 RPS (Request per second) with 1M documents on index, whereas with 6M documents on index is giving me 1200 RPS. Each document size on index is average of 500 bytes. Tried with following different cluster configs, and the behavior is same in both cases.
Elastic Search version: 1.2.1
Cluster 1 configuration: 4 node cluster
2 Routers Nodes: VMs with 4 Core CPU, 4 GB Ram, 20GB disk
2 Data Nodes: Physicals with 32 Core CPU, 198GB Ram (24G allocated to ES process), 250 GB disk
Number of shards for index : 3 and 1 replica
Cluster 2 configuration: 7 node cluster
2 Routers Nodes: VMs with 4 Core CPU, 4 GB Ram, 20GB disk
2 Data Nodes: VMs with 8 Core CPU, 48GB Ram (24G allocated to ES process), 250 GB local storage
Number of shards for index : 3 and 1 replica
In addition to search query performance issue, i observed that the Data nodes CPU is always spiking to 100% on all core.
Any inputs would be greatly appreciated.
Thanks
Gopi

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio