heap memory usage in elasticsearch - elasticsearch

Currently I'm working on a project which uses elasticsearch 1.4.1.
The cluster consists of 22 nodes, and 20 of them are data nodes, each node has 16GB as the heap memory.
The thing is, when I'm doing massive queries, some of the nodes(2 or 3) consume 70% of the heap memory, while the rest just use less than 10%.
So I'm wondering is this because most of the queries go to those 2 or 3 nodes?
If not, can I further achieve better performance?
Thanks!
Just updated:
I just ran this command: curl -XGET localhost:9200/_cat/shards?v, and it returned:
index shard prirep state docs store ip node
....
mm 2 r STARTED 2248969 293.6mb 10.2.4.117 Mark Todd
mm 2 p STARTED 2248969 293.6mb 10.2.4.129 Saint Elmo
mm 19 r STARTED 30172116 3.5gb 10.2.4.126 Fixer
mm 19 p STARTED 30172116 3.5gb 10.2.4.123 Loki
....
I'm wondering what the store here means? if it is the actual size of the documents, can I load all of them into memory?

This could be because the document matches for that query is somehow colonized to just those 2 machines.
That is , if there are 20 million matches , chances are that 8 million match belongs to one machine , 8 million match comes from another and only just 2 million comes from rest of the 18 machines.
I am guessing you are also using aggregation in the process due to which field data cache is fabricated.

Related

Scalling up ElasticSearch

We have a problem with scaling up Elasticsearch.
We have a simple structure with multiple indexes each index has 2 shards, 1 primary and 1 duplicate.
Around now we have ~ 3000 indexes which mean 6000 shards and we think that we will hit limits with shards soon. (currently running 2 nodes 32gb of ram, 4 cores top usage 65%, we are working on changing servers to 3 nodes to have 1 primary shard and 2 duplicates)
Refresh interval is set to 60s.
Some indexes have 200 documents, others have 10 mil documents most of them are less than 200k.
The total amount of documents is: 40 mil. (amount of documents can increase fast)
Our search requests are hitting multiple indexes at the same time (we might be searching in 50 indexes or 500 indexes, we might need in the future to be able to search in all indexes).
Searching need to be fast.
Currently, we are daily synchronizing all documents by bulk in chunks of 5000 of documents ~ 7mb because from tests that is working best ~ 2,3 seconds per request of 5000 ~ 7mb, done by 10 async workers.
We sometimes hit the same index with workers at same time and request with bulk is taking longer extending even to ~ 12,5 sec per request of 5000 ~ 7mb.
Current synchronization process takes about ~1hour / 40 mil of documents.
Documents are stored by uuids (we are using them to get direct hits of documents from elasticsearch), documents values can be modified daily, sometimes we only change synchronization_hash which determins which documents were changed, after all the synchronization process we run a delete on documents with old synchronization_hash.
Other thing is that we think that our data architecture is broken, we have x clients ~ 300 number can increase, each client is assigned to be only allowed to search in y indexes (from 50 to 500), indexes for multiple clients can repeat in search (client x has 50 indexes, client y has 70 indexes, client y,x most often clients need to have access to same documents in indexes, amount of clients can increase) that's why we store data in separated indexes so we don't have to update all indexes where this data is stored.
To increase a speed of indexing we are thinking even moving to 4 nodes (with each index 2 primary, 2 duplicates), or moving to 4 nodes (with each index only having 2, 1 primary, 1 duplicate), but we need to test things out to figure out what would work for us the best. We might be needing to double amount of documents in next few months.
What do you think can be changed to increase a indexing speed without reducing an search speed?
What can be changed in our data architecture?
Is there any other way that our data should be organized to allow us for fast searching and faster indexing?
I tried many sizes of chunks in synchronization / didn't try to change the architecture.
We are trying to achive increased indexing speed without reducing search speed.

Creating high throughput Elasticsearch cluster

We are in process of implementing Elasticsearch as a search solution in our organization. For the POC we implemented a 3-Node cluster ( each node with 16 VCores and 60 GB RAM and 6 * 375GB SSDs) with all the nodes acting as master, data and co-ordination node. As it was a POC indexing speeds were not a consideration we were just trying to see if it will work or not.
Note : We did try to index 20 million documents on our POC cluster and it took about 23-24 hours to do that which is pushing us to take time and design the production cluster with proper sizing and settings.
Now we are trying to implement a production cluster (in Google Cloud Platform) with emphasis on both indexing speed and search speed.
Our use case is as follows :
We will bulk index 7 million to 20 million documents per index ( we have 1 index for each client and there will be only one cluster). This bulk index is a weekly process i.e. we'll index all data once and will query it for whole week before refreshing it.We are aiming for a 0.5 million document per second indexing throughput.
We are also looking for a strategy to horizontally scale when we add more clients. I have mentioned the strategy in subsequent sections.
Our data model has nested document structure and lot of queries on nested documents which according to me are CPU, Memory and IO intensive. We are aiming for sub second query times for 95th percentile of queries.
I have done quite a bit of reading around this forum and other blogs where companies have high performing Elasticsearch clusters running successfully.
Following are my learnings :
Have dedicated master nodes (always odd number to avoid split-brain). These machines can be medium sized ( 16 vCores and 60 Gigs ram) .
Give 50% of RAM to ES Heap with an exception of not exceeding heap size above 31 GB to avoid 32 bit pointers. We are planning to set it to 28GB on each node.
Data nodes are the workhorses of the cluster hence have to be high on CPUs, RAM and IO. We are planning to have (64 VCores, 240 Gb RAM and 6 * 375 GB SSDs).
Have co-ordination nodes as well to take bulk index and search requests.
Now we are planning to begin with following configuration:
3 Masters - 16Vcores, 60GB RAM and 1 X 375 GB SSD
3 Cordinators - 64Vcores, 60GB RAM and 1 X 375 GB SSD (Compute Intensive machines)
6 Data Nodes - 64 VCores, 240 Gb RAM and 6 * 375 GB SSDs
We have a plan to adding 1 Data Node for each incoming client.
Now since hardware is out of windows, lets focus on indexing strategy.
Few best practices that I've collated are as follows :
Lower number of shards per node is good of most number of scenarios, but have good data distribution across all the nodes for a load balanced situation. Since we are planning to have 6 data nodes to start with, I'm inclined to have 6 shards for the first client to utilize the cluster fully.
Have 1 replication to survive loss of nodes.
Next is bulk indexing process. We have a full fledged spark installation and are going to use elasticsearch-hadoop connector to push data from Spark to our cluster.
During indexing we set the refresh_interval to 1m to make sure that there are less frequent refreshes.
We are using 100 parallel Spark tasks which each task sending 2MB data for bulk request. So at a time there is 2 * 100 = 200 MB of bulk requests which I believe is well within what ES can handle. We can definitely alter these settings based on feedback or trial and error.
I've read more about setting cache percentage, thread pool size and queue size settings, but we are planning to keep them to smart defaults for beginning.
We are open to use both Concurrent CMS or G1GC algorithms for GC but would need advice on this. I've read pros and cons for using both and in dilemma in which one to use.
Now to my actual questions :
Is sending bulk indexing requests to coordinator node a good design choice or should we send it directly to data nodes ?
We will be sending query requests via coordinator nodes. Now my question is, lets say since my data node has 64 cores, each node has thread pool size of 64 and 200 queue size. Lets assume that during search data node thread pool and queue size is completely exhausted then will the coordinator nodes keep accepting and buffering search requests at their end till their queue also fill up ? Or will 1 thread on coordinator will also be blocked per each query request ?
Say a search request come up to coordinator node it blocks 1 thread there and send request to data nodes which in turn blocks threads on data nodes as per where query data is lying. Is this assumption correct ?
While bulk indexing is going on ( assuming that we do not run indexing for all the clients in parallel and schedule them to be sequential) how to best design to make sure that query times do not take much hit during this bulk index.
References
https://thoughts.t37.net/designing-the-perfect-elasticsearch-cluster-the-almost-definitive-guide-e614eabc1a87
https://thoughts.t37.net/how-we-reindexed-36-billions-documents-in-5-days-within-the-same-elasticsearch-cluster-cd9c054d1db8
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
We did try to index 20 million documents on our POC cluster and it took about 23-24 hours
That is surprisingly little — like less than 250 docs/s. I think my 8GB RAM laptop can insert 13 million docs in 2h. Either you have very complex documents, some bad settings, or your bottleneck is on the ingestion side.
About your nodes: I think you could easily get away with less memory on the master nodes (like 32GB should be plenty). Also the memory on data nodes is pretty high; I'd normally expect heap in relation to the rest of the memory to be 1:1 or for lots of "hot" data maybe 1:3. Not sure you'll get the most out of that 1:7.5 ratio.
CMS vs G1GC: If you have a current Elasticsearch and Java version, both are an option, otherwise CMS. You're generally trading throughput for (GC) latency, so if you benchmark be sure to have a long enough timeframe to properly hit GC phases and run as close to production queries in parallel as possible.
Is sending bulk indexing requests to coordinator node a good design choice or should we send it directly to data nodes ?
I'd say the coordinator is fine. Unless you use a custom routing key and the bulk only contains data for that specific data node, 5/6th of the documents would need to be forwarded to other data nodes anyway (if you have 6 data nodes). And you can offload the bulk processing and coordination handling to non data nodes.
However, overall it might make more sense to have 3 additional data nodes and skip the dedicated coordinating node. Though this is something you can only say for certain by benchmarking your specific scenario.
Now my question is, lets say since my data node has 64 cores, each node has thread pool size of 64 and 200 queue size. Lets assume that during search data node thread pool and queue size is completely exhausted then will the coordinator nodes keep accepting and buffering search requests at their end till their queue also fill up ? Or will 1 thread on coordinator will also be blocked per each query request ?
I'm not sure I understand the question. But have you looked into https://www.elastic.co/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster, which might shed some more light on this topic?
While bulk indexing is going on ( assuming that we do not run indexing for all the clients in parallel and schedule them to be sequential) how to best design to make sure that query times do not take much hit during this bulk index.
While there are different queues for different query operations, there is otherwise no clear separation of tasks (like "only use 20% of the resources for indexing). Maybe go a little more conservative on the parallel bulk requests to avoid overloading the node.
If you are not reading from an index while it's being indexed (ideally you flip an alias once done): You might want to disable the refresh rate entirely and let Elasticsearch create segments as needed, but do a force refresh and change the setting once done. Also you could try running with 0 replicas while indexing, change replicas to 1 once done, and then wait for it to finish — though I'd benchmark if this is helping overall and if it's worth the added complexity.

ElasticSearch stats API taking long time to response

We have an ES cluster at AWS running with the following setup:
(I know, i need minimum 3 master nodes)
1 Coordinator
2 Data nodes
1 Master Node
Data nodes spec:
CPU: 8 Cores
Ram: 20GB
Disk: 1TB ssd 4000 IOPS
Problem:
ES endpoints for Search, Delete, Backup, Cluster Heatlh, Insert are working fine.
Since yesterday some endpoints like /_cat/indices, /_nodes/_local/stats and etc, started to take too long to respond(more than 4 minutes) :( and consequently our Kibana is in red state(Timeout after 30000ms)
Useful info:
All Shards are OK (3500 in total)
The cluster is in green state
X-pack disabled
Average of 1gb/shard
500k document count.
Requests made by localhost at AWS
CPU, DISK, RAM, IOPS all fine
Any ideas?
Thanks in advance :)
EDIT/SOLUTION 1:
After a few days i found out what was the problem, but first a little bit context...
We use Elasticsearch for storing user audit messages, and mobile error messages, at the first moment (obiviously in a rush to deliver new microservices and remove load from our MongoDB cluster) we designed elasticsearch indices by day, so every day a new indice was created and at the end of the day that indice had arround 6 ~ 9gb of data.
Six months later, almost 180 indices bigger, and 720 primary shards open we bumped into this problem.
Then i did read this again(the basics!) :
https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html
After talking to the team responsible for this microservice we redesigned our indices to a monthly index, and guess what? problem solved!
Now our cluster is much faster than before and this simple command saved me some sweet nights of sleep.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
Thanks!

What will be specific benchmark of elasticsearch reads, writes and updates?

We are using Elasticsearch (version 5.6.0) data updates of around 13M documents with each document in the nested structure having max 100 key value pair, it takes around 34 min to update 99 indices. Hardware is as follows:
5 M4-4x large machines (32G RAM and 8 cores)
500GB disk
So, what should be the Ideal update time elasticsearch should take for the update?
What are the optimization I can do to get good performance?

Correcting improper usage of Cassandra

I have a similar question that was unanswered (but had many comments):
How to make Cassandra fast
My setup:
Ubuntu Server
AWS service - Intel(R) Xeon(R) CPU E5-2680 v2 # 2.80GHz, 4GB Ram.
2 Nodes of Cassandra Datastax Community Edition: (2.1.3).
PHP 5.5.9. With datastax php-driver
I come from a MySQL database knowledge with very basic NoSQL hands on in terms of ElasticSearch (now called Elastic) and MongoDB in terms of Documents storage.
When I read how to use Cassandra, here are the bullets that I understood
It is distributed
You can have replicated rings to distribute data
You need to establish partition keys for maximum efficiency
Rethink your query rather than to use indices
Model according to queries and not data
Deletes are bad
You can only sort starting from the second key of your primary key set
Cassandra has "fast" write
I have a PHP Silex framework API that receive a batch json data and is inserted into 4 tables as a minimum, 6 at maximum (mainly due to different types of sort that I need).
At first I only had two nodes of Cassandra. I ran Apache Bench to test. Then I added a third node, and it barely shaved off a fraction of a second at higher batch size concurrency.
Concurrency Batch size avg. time (ms) - 2 Nodes avg. time (ms) - 3 Nodes
1 5 288 180
1 50 421 302
1 400 1 298 1 504
25 5 1 993 2 111
25 50 3 636 3 466
25 400 32 208 21 032
100 5 5 115 5 167
100 50 11 776 10 675
100 400 61 892 60 454
A batch size is the number of entries (to the 4-6 tables) it is making per call.
So batch of 5, means it is making 5x (4-6) table insert worth of data. At higher batch size / concurrency the application times out.
There are 5 columns in a table with relatively small size of data (mostly int with text being no more than approx 10 char long)
My keyspace is the following:
user_data | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}
My "main" question is: what did I do wrong? It seems to be this is relatively small data set of that considering that Cassandra was built on BigDataTable at very high write speed.
Do I add more nodes beyond 3 in order to speed up?
Do I change my replication factor and do Quorum / Read / Write and then hunt for a sweet spot from the datastax doc: http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
Do I switch framework, go node.js for higher concurrency for example.
Do I rework my tables, as I have no good example as to how effectively use column family? I need some hint for this one.
For the table question:
I'm tracking history of a user. User has an event and is associated to a media id, and there so extra meta data too.
So columns are: event_type, user_id, time, media_id, extra_data.
I need to sort them differently therefore I made different tables for them (as I understood how Cassandra data modeling should work...I am perhaps wrong). Therefore I'm replicating the different data across various tables.
Help?
EDIT PART HERE
the application also has redis and mysql attached for other CRUD points of interest such as retrieving a user data and caching it for faster pull.
so far on avg with MySQL and then Redis activated, I have a 72ms after Redis kicks in, 180ms on MySQL pre-redis.
The first problem is you're trying to benchmark the whole system, without knowing what any individual component can do. Are you trying to see how fast an individual operation is? or how many operations per second you can do? They're different values.
I typically recommend you start by benchmarking Cassandra. Modern Cassandra can typically do 20-120k operations per second per server. With RF=3, that means somewhere between 5k and 40k reads / second or writes/second. Use cassandra-stress to make sure cassandra is doing what you expect, THEN try to loop in your application and see if it matches. If you slow way down, then you know the application is your bottleneck, and you can start thinking about various improvements (different driver, different language, async requests instead of sync, etc).
Right now, you're doing too much and analyzing too little. Break the problem into smaller pieces. Solve the individual pieces, then put the puzzle together.
Edit: Cassandra 2.1.3 is getting pretty old. It has some serious bugs. Use 2.1.11 or 2.2.3. If you're just starting development, 2.2.3 may be OK (and let's assume you'll actually go to production with 2.2.5 or so). If you're ready to go prod tomorrow, use 2.1.x instead.

Resources