optimize elasticsearch / JVM - performance

I have a website for classified. For this I'm using elasticsearch, postgres and rails on a same ubuntu 14.04 dedicated server, with 256GB of RAM and 20 cores, 40 threads.
I have 10 indexes on elasticsearch, each have default numbers of shards (5). They have between 1000 and 400 000 classifieds depending on which index.
approximately 5000 requests per minute, 2/3 making an elasticsearch request.
according to htop, jvm is using around 500% of CPU
I try different options, I reduce number of shards per index, I also try to change JAVA_OPTS as followed
#JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
#JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"
#JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
#JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC"
but it doesn't seems to change anything.
so to questions :
when you change any setting on elasticsearch, and then restart, should the improvement (if any) be visible immediately or can it arrive a bit later thanks to cache or anything else ?
can any one help me to find good configuration for JVM / elasticsearch so it will not take that many resources

First, it's a horrible idea to run your web server, database and Elasticsearch server all on the same box. Each of these should be given it's own box, at least. In the case of Elasticsearch, it's actually recommended to have at least 3 servers, or nodes. That way you end up with a load balanced cluster that won't run into split-brain issues.
Further, sharding only makes sense in a cluster. If you only have one node, then all the shards reside on the same node. This causes two performance problems. First, you get the hit that sharding always adds. For every query, Elasticsearch must query each shard individually (each is a separate Lucene index). Then, it must combine and process the result from all the shards to produce the final result. That's a not insignificant amount of overhead. Second, because all the shards reside on the same node, you're I/O-locked. The shards have to be queried one at a time instead of all at once. Optimally, you should have one shard per node, however, since you can't create more shards without reindexing, it's common to have a few extra hanging around for future horizontal scaling. In that scenario, the cost of reindexing what could be 100's of gigs of data or more outweighs a little bit of performance bottleneck. However, if you've got 5 shards running one node, that's probably a large part of your performance problems right there.
Finally, and again, with Elasticsearch in particular, swapping is a huge no-no. Most of what makes Elasticsearch efficient is it's cache which all resides in RAM. If swaps occur, it jacks with the cache in sometimes unpredictable ways. As result, it's recommended to turn off swapping completely on the box your node(s) run on, and set Elasticsearch/JVM to have a min and max memory consumption of roughly half the available RAM of the box. That's virtually impossible to achieve if you have other things running on it like a web server or database. Databases in particular aggressively consume RAM in order to increase throughput, which is why those should likewise reside on their own servers.

Related

optimization on old indexes collecting logs from my apps

I have an elastic cluster with 3x nodes(each 6x cpu, 31GB heap , 64GB RAM) collecting 25GB logs per day , but after 3x months I realized my dashboards become very slow when checking stats in past weeks , please, advice if there is an option to improve the indexes read erformance so it become faster when calculating my dashboard stats?
Thanks!
I would suggest you try to increase the shards number
when you have more shards Elasticsearch will split your data over the shards so as a result, Elastic will send multiple parallel requests to search in a smaller data stack
for Shards number you could try to split it based on your heap memory size
No matter what actual JVM heap size you have, the upper bound on the maximum shard count should be 20 shards per 1 GB of heap configured on the server.
ElasticSearch - Optimal number of Shards per node
https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index
https://opster.com/elasticsearch-glossary/elasticsearch-choose-number-of-shards/
It seems that the amount of data that you accumulated and use for your dashboard is causing performance problems.
A straightforward option is to increase your cluster's resources but then you're bound to hit the same problem again. So you should rather rethink your data retention policy.
Chances are that you are really only interested in most recent data. You need to answer the question what "recent" means in your use case and simply discard anything older than that.
Elasticsearch has tools to automate this, look into Index Lifecycle Management.
What you probably need is to create an index template and apply a lifecycle policy to it. Elasticsearch will then handle automatic rollover of indices, eviction of old data, even migration through data tiers in hot-warm-cold architecture if you really want very long retention periods.
All this will lead to a more predictable performance of your cluster.

How to add another shard to production for tarantool Database, without downtime?

We use tarantool database (sharded using vshard) in production. We started directly with 4 shards. Now we want to increase it to 6 without downtime. But, after adding two more shards, rebalancer kicks in and it doesn't allow reads/writes to happen. Is there any way, that rebalancing can happen supporting all kinds of operations? We can afford to increase the operation time. But it should be a success. What is the best practice to add a shard to tarantool with the minimum inconvenience caused in the product front?
Currently, the Only solution we can think of is to go into maintenance mode and have the rebalance to finish with minimum time possible!!!
You can not write to a bucket that is being transferred right now, but you cant write to other buckets (so it's not like the whole shard is locked up).
Moreover, you can mitigate the effect by
- making buckets smaller (increase bucket_count)
- making rebalancing slower so that that less buckets are transferred simultaneoulsy (rebalancer config).
Suppose, you have 16384 buckets and your dataset is 75GB. It means that average bucket size is around 5 Mb. If you decrease rebalancer_max_receiving parameter to 10, you'll have only 10 buckets (50Mb) being transferred simultaneously (which makes him locked for writes).
This way, rebalancing will be pretty slow, BUT, given that your clients can perform retries and your network between shards is fast enough, the 'write-lock' effect should got unnoticed at all.

Can Circuit break exception be avoided using horizontal scaling?

I am using crate 1.0.2 which internally uses elasticsearch. So my question is applicable for both. For certain queries I get circuit break exception.
CircuitBreakingException: [parent] Data too large, data for [collect: 0] would be larger than limit of [11946544332/11.1gb]
These queries are mainly group by on multiple columns. I have billions of documents indexed and have 16 GB of RAM allocated as crate heap size. I have multiple such nodes connected together in a cluster. Will adding more nodes in the cluster help in getting rid of this error and will my same queries run fine ? Or is it that I must increase heap to 30 GB ? My worry is when I increase it to 30GB and as I add more data, someday that query will again hit the circuit breaker. So I wanted to solve it by scaling horizontally i.e. adding more nodes. Will that be wiser decision ?
Short answer: Usually horizontal scaling helps.
Your error seems to be caused by group by queries.
The GROUP BY operations are executed in a distributed fashion. So more nodes
will generally split the load and therefore also the memory usage. (Make sure
there are enough shards so that they're spread among all nodes)
There is a catch though: Eventually the data needs to be merged together on the
node you sent the initial query to. This is generally fine because the data
arrives pre-aggregated, but If the cardinality is too high (Ex. GROUP BY on a
primary key), the whole data set has to fit into memory on this coordinator
node.
If your nodes have enough memory to go up to 30 GB (with still having enough to
spare for the file system cache), I'd personally tend to increase the HEAP size
first, before adding new nodes.
Update:
Recent versions (2.1.X) also contain some fixes regarding the circuit-breaker behaviur. So if it's possible to update that'd be recommended as well.
Update2:
Note that there are different cases in which a circuit breaker can trip. In
your case it's caused by a GROUP BY using up quite a lot of memory. But it can
also be tripped if a single request is too large. For example if the bulk size
is too large. In such a case more nodes wouldn't help. You'd have to reduce the
bulk size.

Elasticsearch with different java heaps, does it matter?

Say, I've got 2 servers. One of which has -xmx and -xms set to 4G and one to 2G.
Will ElasticSearch handle those performance differences in the balancing mode? Or will both the servers be called purely based on indices, resulting in a (much more) likely OOM for the latter than the former?
By the way, I've set the properties indices.fielddata.cache.size, indices.breaker.fielddata.limit, indices.breaker.request.limit, and indices.breaker.total.limit on both servers as ElasticSearch is suggesting
This is important, to me, because if it does, I'd have to change the index sharding on guessed index strain, which will be a hassle (if not impossible)
Elasticsearch treats every nodes as the same and equally balances the documents between them. This means that Elasticsearch wont readjust based on hardware and get you the optimal performance.
One thing to remember here is that a herd of bulls is only as fast as its slowest bull. The same gets applied here. But then if the load is small enough that it does not eat up all the hardware for 2 GB machine ,then we should not be seeing any issue. Otherwise you should see difference in memory aggressive operations like aggregations.

Elasticsearch fuzzy matching optimization for huge server/server cluster

I've got an index with quite complex queries running on it. The main slowdown are the fuzzy queries which are run against a field containing 2-5 words for each record. I mainly have to find rows with 1-3 differing characters.
On my 4 core (with HT) and 8GB ram machine the my queries are executed in about 1-2s each.
On a server with 12 cores (with HT) and 72Gb RAM the query executes in 0.3-0.5 seconds. This doesn't seem to me as a reasonable scaling on the hardware provided. I'm sure there should be some hidden options for me to tune to adjust the query performance.
I've looked through the elastic search guide but couldn't find there anything which would help me in tuning the performance based on the number of CPUs or RAM or tuning elastic specifically for fuzzy queries.
another question is how does it scale if i add another server like this? will the query time be roughly twice smaller?
There is a couple of possibilities here. First is that your query is I/O bound. In this case, just adding another server might help because two nodes will be retrieving data from two disks. Another possibility is that your query is CPU bound. To a large degree, search against a single shard is a single-threaded process. Assuming that your index was created with default settings, it has 5 shards. So, your query cannot significantly benefit from running on more than 5 CPUs. In this case, adding another node would only slow things down because of network overhead. Instead, you need to recreate index with more shards.

Resources