how to decide the memory requirement for my elasticsearch server - elasticsearch

I have a scenario here,
The Elasticsearch DB with about 1.4 TB of data having,
_shards": {
"total": 202,
"successful": 101,
"failed": 0
}
Each index size is approximately between, 3 GB to 30 GB and in near future, it is expected to have 30GB file size on a daily basis.
OS information:
NAME="Red Hat Enterprise Linux Server"
VERSION="7.2 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.2"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.2 (Maipo)"
The system has 32 GB of RAM and the filesystem is 2TB (1.4TB Utilised). I have configured a maximum of 15 GB for Elasticsearch server.
But this is not enough for me to query this DB. The server hangs for a single query hit on server.
I will be including 1TB on the filesystem in this server so that the total available filesystem size will be 3TB.
also I am planning to increase the memory to 128GB which is an approximate estimation.
Could someone help me calculate how to determine the minimum RAM required for a server to respond at least 50 requests simultaneously?
It would be greatly appreciated if you can suggest any tool/ formula to analyze this requirement. also it will be helpful if you can give me any other scenario with numbers so that I can use that to determine my resource need.

You will need to scale using several nodes to stay efficient.
Elasticsearch has its per-node memory sweet spot at 64GB with 32GB reserved for ES.
https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html#_memory for more details. The book is a very good read if you are using Elasticsearch for serious stuff

If you're here for a rule of thumb, I'd say that on modern ES and Java, 10-20GB of heap per TB of data (I'm thinking of the typical ELK use-case) should be enough. Multiplying by 2, that's 20-40GB of total RAM per TB.
Now for the datailed answer :) There are two types of memory that are relevant here:
JVM heap
OS cache (the OS will use free memory to cache index files)
OS cache is down to your IO requirements (queries do lots of small random IO). If you have a query-intensive use-case (e.g. E-commerce), you'll want to fit your whole index in the OS cache (or at least most of it). For logs and other time-series data, you typically have more expensive, rarer queries. There, if you have a local SSD you can make do with only a fraction of your data in the cache. I've seen servers with 4TB of disk space on 32GB of OS cache.
JVM heap can also be divided in two:
static memory, required even when the server is idle
transient memory, required by ongoing indexing/search operations
You'd see most of the static memory if you hit the _nodes/stats endpoint. It's best if you have these plotted in your Elasticsearch monitoring tool. You'll see it as segments_memory and various caches. For recent versions of Elasticsearch (e.g. 7.7 or higher), there's not a lot of memory like this - at least for most use-cases. I've seen ELK deployments with multiple TB of data definitely using less than 10GB of RAM for static memory. That said, you may reduce it by not storing info that you don't need. For example by not indexing fields you don't search on.
Transient memory will mainly depend on your queries: how often they run and how expensive they are. One-off expensive queries tend to be more dangerous, so avoid using too many levels of aggregations, massive size values, or queries that expand to too many terms (wildcards, fuzzy...). To accommodate those, you simply need heap. How much? It's really a matter of monitor-and-adjust.
Side-note: I don't agree with the general suggestion that you should stay under 32GB at all costs. With Java 11+ and G1GC, I've seen deployments with over 100GB of heap that run just fine. The overhead of uncompressed oops is not 10-20GB at every 30GB, like the docs suggest - that's an extrapolation of a worse-case scenario. In my experience, it's more like a few GB every 30GB - something like 10% for many deployments. This doesn't mean you have to use 100GB of heap, it's just that if you need a lot of heap in your cluster, you don't have to have hundreds of nodes (you can have fewer bigger ones).
Speaking of GC, it may fall behind if you run many queries that aren't terribly expensive. And then you'd run out of heap, even if you have plenty. Monitoring should tell you this, as a full GC will eventually clean up the heap with a big pause (read: cluster instability). Here, Java 11 with G1GC and a low -XX:GCTimeRatio (e.g. 3) should fix the issue.

This gives a good overview of heap sizing and memory management and you will be able to answer yourself.
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
https://www.elastic.co/guide/en/elasticsearch/guide/master/_limiting_memory_usage.html

Related

Redis vs memcached vs Scylla Cache - Which one to choose?

I'm designing an application where I want to cache million data each around 10kb.. I did some analysis and on the fence between using Redis vs memcached vs Scylla as Cache.. Can some experts suggests which might best suits my needs?
Highly performant
High availability
High Throughput
Low pricing?
Full disclosure - I work on the Scylla project.
I think it is a question of latency and HA vs cost. As a RAM-based system, Redis will be the lowest latency. If you need < 1 millisecond response, then Redis or memcached are the choice.
Scylla is a disk-based system. Those values that are in Scylla's RAM will be low latency, but those that need to pull from disk will be slower. So your 99p latency is likely to be slower. How slow? Depends on your disk. NVME can be 99p 3-5 ms. SSD, maybe 5-10 ms. If this is an acceptable latency, then Scylla will be much less expensive, as even NVME is much cheaper than RAM.
As for HA - Redis and memcached are intended as a cache. While there are some features and frameworks that you can use to replicate data around, these are all bolt-ons and increase complexity. Scylla is a distributed system by design. So the replication to allow for multiple layers of HA is built-in (node, rack and DC-availability)
Redis (and to a lesser extend, memcached) are phenomenal caches. But, depending upon your use case, Scylla might be the right choice.
All three options you mentioned are open-source software, so the pricing is the same - zero :-) However, both Scylla and Redis are written and backed by companies (ScyllaDB and RedisLabs, respectively), so if your use case is mission-critical you may choose to pay these companies for enteprise-level support, you can inquire with these companies what are their prices.
The more interesting difference between the three is in the technology.
You described a use case where you have 10 GB of data in the cache. This amount can be easily held in memory, so a completely in-memory database like Memcached or Redis is a natural choice. However, there are still questions you need to ask yourself, which may lead you to a distributed database, such as Scylla depending on your answers:
Would you be using powerful many-core machines? If so, you should probably rule out Memcached - my experience (and others' - see
Can memcached make full use of multi-core?) suggests that it does not scale well with many cores. On an 8-core machine you will not get anywhere close to 8 times the performance of a one-core machine.
Redis is also not really meant for multi-core use - https://redis.io/topics/benchmarks says that Redis "is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed.". Scylla, on the other hand, thrives on multi-core machines. You should probably test the performance of all three products on your use case before making a decision.
How much of a disaster would be to suddenly lose the entire content of your cache? In some use cases, it just means you would need to query some slightly-slower backend server, so suddenly losing the cache on reboot is acceptable. In such cases, a memory-only cache like Memached or Redis is probably exactly what you need. However, in other cases, there may be a big penalty for starting from scratch with an empty cache - the backend server might be very slow, or maybe the original content is stored on a far-away server with a slow and expensive WAN. In such a case you would want a disk-backed cache, so if the memory cache is lost, you can still refresh it from disk and not from the backend server. Redis has a disk backing option, and in Scylla disk backing is the main way.
You mentioned a working set of 10 GB, which can easily fit memory of a single server. But is it possible this will grow and in a year you'll find yourself needing to cache 100 GB or 1 TB, which no longer fits the memory of a single server? In memcached you'll be out of luck. Redis used to have a "virtual memory" solution for this purpose, but it is deprecated and https://redis.io/topics/virtual-memory now states that Redis is "without considering at least for now the support for databases bigger than RAM". Scylla does handle this issue in two ways. First, your cache would be stored on disk which can be much larger than memory (and whatever amount of memory you have will be used to further speed up that cache, but it doesn't need to fit memory). Second, Scylla is a distributed server. It can distribute a 100 GB working set to 10 different nodes. Redis also has "replication", but it copies the entire data to all nodes - while Scylla can optionally store different subsets of the data on different nodes.
In-memory is actually a bad thing since RAM is expensive and not persistent.
So Scylla will be a better option for K/V or columnar workloads.
Scylla also has a limited Redis api with good results [1], using the CQL
api will result in better results.
[1] https://medium.com/#siddharthc/redis-on-nvme-with-scylladb-5e12afd38dbc

How to determine what causes ES's query API instability

Normally, my ES query API takes less than 1s.But sometimes these queries get slow.
cluster consists of three 32G machines (16G allocated to ES).The index consists of 20 primaries and 1 replica, 303,000,000 dos count and 500gb primaries storage size and 1tb storage size.
Here's kibana's monitoring data:
`
Personally, I think it's the result of GC. I want to add machines.But I need to find a reason to convince my leader.
Yes it could be a GC problem. But can you be more specific? What do you mean by slow?
Anyway it seems the allocated heap is way too large for your needs. You have a collection when the heap is at 12Go ( 75% of 16go ) and it goes back to 5go every time. Its generate huge garbage collection.
You should try to lower the heap to like 10Go and check the impact on performance GC count and GC duration.
I recommands you too read this article https://www.elastic.co/blog/a-heap-of-trouble especially the "Together We Can Prevent Forest Fires" part.

Understanding elasticsearch jvm heap usage

Folks,
I am trying reduce my memory usage in my elasticsearch deployment (Single node cluster).
I can see 3GB JVM heap space being used.
To optimize I first need to understand the bottleneck.
I have limited understanding of how is JVM usage is split.
Field data looks to consume 1.5GB and filter cache & query cache combined consume less than 0.5GB, that adds upto 2GB at the max.
Can someone help me understand where does elasticsearch eats up rest of 1GB?
I can't tell for your exact setup, but in order to know what's going on in your heap, you can use the jvisualvm tool (bundled with the jdk) together with marvel or the bigdesk plugin (my preference) and the _cat APIs to analyze what's going on.
As you've rightly noticed, the heap hosts three main caches, namely:
the fielddata cache: unbounded by default, but can be controlled with indices.fielddata.cache.size (in your case it seems to be around 50% of the heap, probably due to the fielddata circuit breaker)
the node query/filter cache: 10% of the heap
the shard request cache: 1% of the heap but disabled by default
There is nice mindmap available here (Kudos to Igor Kupczyński) that summarizes the roles of caches. That leaves more or less ~30% of the heap (1GB in your case) for all other object instances that ES needs to create in order to function properly (see more about this later).
Here is how I proceeded on my local env. First, I started my node fresh (with Xmx1g) and waited for green status. Then I started jvisualvm and hooked it onto my elasticsearch process. I took a heap dump from the Sampler tab so I can compare it later on with another dump. My heap looks like this initially (only 1/3 of max heap allocated so far):
I also checked that my field data and filter caches were empty:
Just to make sure, I also ran /_cat/fielddata and as you can see there's no heap used by field data yet since the node just started.
$ curl 'localhost:9200/_cat/fielddata?bytes=b&v'
id host ip node total
TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 0
This is the initial situation. Now, we need to warm this all up a bit, so I started my back- and front-end apps to put some pressure on the local ES node.
After a while, my heap looks like this, so its size has more or less increased by 300 MB (139MB -> 452MB, not much but I ran this experiment on a small dataset)
My caches have also grown a bit to a few megabytes:
$ curl 'localhost:9200/_cat/fielddata?bytes=b&v'
id host ip node total
TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 9066424
At this point I took another heap dump to gain insights into how the heap had evolved, I computed the retained size of the objects and I compared it with the first dump I took just after starting the node. The comparison looks like this:
Among the objects that increased in retained size, he usual suspects are maps, of course, and any cache-related entities. But we can also find the following classes:
NIOFSDirectory that are used to read Lucene segment files on the filesystem
A lot of interned strings in the form of char arrays or byte arrays
Doc values related classes
Bit sets
etc
As you can see, the heap hosts the three main caches, but it is also the place where reside all other Java objects that the Elasticsearch process needs and that are not necessarily cache-related.
So if you want to control your heap usage, you obviously have no control over the internal objects that ES needs to function properly, but you can definitely influence the sizing of your caches. If you follow the links in the first bullet list, you'll get a precise idea of what settings you can tune.
Also tuning caches might not be the only option, maybe you need to rewrite some of your queries to be more memory-friendly or change your analyzers or some fields types in your mapping, etc. Hard to tell in your case, without more information, but this should give you some leads.
Go ahead and launch jvisualvm the same way I did here and learn how your heap is growing while your app (searching+indexing) is hitting ES and you should quickly gain some insights into what's going on in there.
Marvel only plots some instances on the heap which needs to be monitored like Caches in this case.
The caches represent only a portion of the total heap usage. There are a lot many other instances which will occupy the heap memory and those may not have a direct plotting on this marvel interface.
Hence, Not all heap occupied in ES is only by the cache.
In order to clearly understand the exact usage of heap by different instances, you should take heap dump of the process and then analyze it using a Memory Analyzer tool which can provide you with the exact picture.

How should PostgreSQL be configured for this setup?

I would like to tweak my PostgreSQL server but even after reading a few tutorials online I am not getting good performance out of the database.
I've got a server with the following specs:
Windows Server 2012 R2 Datacenter
Intel CPU E5-2670 v2 # 2.50 GHz
64-bit Operating System
512 GB RAM
PostgreSQL 9.3
I would like to use postgres as a data storage / aggregation system for the following tasks:
Read data from various data sources (mostly flat files) (volumes between 100GB and 1TB)
Pre-process / clean data
Aggregate data
Feed aggregated or sampled data into R or python for modelling
Up to 10 concurrent users only
This means, I do not really care about the following:
Update speads (I only bulk-load data)
Failure resistance (in the unlikely event that things break, I can always reload everything from my input files)
Currently, load speeds are fine, but creating indexes and aggregating data takes very long and barely uses any memory.
Here is my current postgres.config: http://pastebin.com/KpSi2zSd
I think the obvious step here is to increase the work_mem and maintenance_work_mem considerably, with the fine detail being "how much"?
If you have control over how many aggregation queries and/or index creations are running at a time then you can be pretty aggressive with these, but you face the risk that with 10 concurrent users and a 30GB setting you could be putting your server under memory pressure.
It would really benefit you to get some execution plans for the slow running queries, as they will tell you that you need so-much memory for "Sort Method: external merge Disk" for example, and you can then adjust your settings while keeping an eye on the total memory usage on the server.
I wouldn't rule out that you have to re-jig your loads so that the most resource intensive run on their own, while less resource intensive operations run at the same time.
However, I think at the moment you are lacking some of the hard metrics that will let you make a good choice on memory allocation.

Logstash/Elasticsearch/Kibana resource planning

How to plan resources (I suspect, elasticsearch instances) according to load:
With load I mean ≈500K events/min, each containing 8-10 fields.
What are the configuration knobs I should turn?
I'm new to this stack.
500,000 events per minute is 8,333 events per second, which should be pretty easy for a small cluster (3-5 machines) to handle.
The problem will come with keeping 720M daily documents open for 60 days (43B documents). If each of the 10 fields is 32 bytes, that's 13.8TB of disk space (nearly 28TB with a single replica).
For comparison, I have 5 nodes at the max (64GB of RAM, 31GB heap), with 1.2B documents consuming 1.2TB of disk space (double with a replica). This cluster could not handle the load with only 32GB of RAM per machine, but it's happy now with 64GB. This is 10 days of data for us.
Roughly, you're expecting to have 40x the number of documents consuming 10x the disk space than my cluster.
I don't have the exact numbers in front of me, but our pilot project for using doc_values is giving us something like a 90% heap savings.
If all of that math holds, and doc_values is that good, you could be OK with a similar cluster as far as actual bytes indexed were concerned. I would solicit additional information on the overhead of having so many individual documents.
We've done some amount of elasticsearch tuning, but there's probably more than could be done as well.
I would advise you to start with a handful of 64GB machines. You can add more as needed. Toss in a couple of (smaller) client nodes as the front-end for index and search requests.

Resources