is it possible to setup elasticsearch on ceph block storage - elasticsearch

Is it possible to use ceph as storage backend for elasticsearch?
It seems to me that elasticsearch only supports disk writes. But I want to use my ceph cluster to be used as storage backend for the elasticsearch cluster

We have been using it at my company. We have 3 elasticsearch nodes with ceph as storage. It usually ingests about 20,000 records per seconds (5-6 millions per 5 minutes) with the loads being around 1.5 - 2.5 on all nodes.
You need to make sure you have very fast local network though.
My setup:
No of primary shards: 3
No of replica: 1
https://nayarweb.com/blog/2017/high-load-on-one-of-elasticsearch-node-on-ceph/

Related

Controlling where shards are allocated

My setup:
two zoness: fast and slow with 5 nodes each.
fast nodes have ephemeral storage, whereas the slow nodes are NFS based.
Running Elasticsearch OSS v7.7.1. (I have no control over the version)
I have the following cluster setting: cluster.routing.allocation.awareness.attributes: zone
My index has 2 replicas, so 3 shard instances (1x primary, 2x replica)
I am trying to ensure the following:
1 of the 3 shard instances to be located in zone fast.
2 of the 3 shard instances to be located in zone slow (because it has persistent storage)
Queries to be run in shard in zone fast where available.
Inserts to only return as written once its written once its been replicated.
Is this setup possible?
Link to a related question: How do I control where my primary and replica shards are located?
EDIT to add extra information:
Both fast and slow nodes run on a PaaS offering where we are not in control of hardware restarts meaning there can technically be non-graceful shutdowns/restarts at any point.
I'm worried about unflushed data and/or index corruption so I am looking for multiple replicas to be on the slow zone nodes backed by NFS to reduce the likelihood of data loss, despite the fact that this will "overload" the slow zone with redundant data.

High CPU usage on elasticsearch nodes

we have been using a 3 node Elasticsearch(7.6v) cluster running in docker container. I have been experiencing very high cpu usage on 2 nodes(97%) and moderate CPU load on the other node(55%). Hardware used are m5 xlarge servers.
There are 5 indices with 6 shards and 1 replica. The update operations take around 10 seconds even for updating a single field. similar case is with delete. however querying is quite fast. Is this because of high CPU load?
2 out of 5 indices, continuously undergo a update and write operations as they listen from a kafka stream. size of the indices are 15GB, 2Gb and the rest are around 100MB.
You need to provide more information to find the root cause:
All the ES nodes are running on different docker containers on the same host or different host?
Do you have resource limit on your ES docker containers?
How much heap size of ES and is it 50% of host machine RAM?
Node which have high CPU, holds the 2 write heavy indices which you mentioned?
what is the refresh interval of your indices which receives high indexing requests.
what is the segment size of your 15 GB indices, use https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-segments.html to get this info.
What all you have debugged so far and is there is any interesting info you want to share to find the issue?

Improving speed elasticsearch cluster

i have setup 3 node clusters (24 core each).
when i run query that time it's giving slow result.
but i have created a stand alone server and ran same query that time giving fast result.
my cluster and data details:-
i have created 12 indexer and around 3 shards each.
total size of data 60gb.
elastic search version 5.3
server disk 500gb each.
so kindly guide me how to achieve better speed.
Thanks in advance.

Indexing multiple indexes in elastic search at the same time

I am using logstash for ETL purpose and have 3 indexes in the Elastic search .Can I insert documents into my 3 indexes through 3 different logtash processes at the same time to improve the parallelization or should I insert documents into 1 index at a time.
My elastic search cluster configuration looks like:
3 data nodes
1 client node
3 data nodes - 64 GB RAM, SSD Disk
1 client node - 8 GB RAM
Shards - 20 Shards
Replica - 1
Thanks
As always it depends. The distribution concept of Elasticsearch is based on shards. Since the shards of an index live on different nodes, you are automatically spreading the load.
However, if Logstash is your bottleneck, you might gain performance from running multiple processes. Though if running multiple LS process on a single machine will make a positive impact is doubtful.
Short answer: Parallelising over 3 indexes won't make much sense, but if Logstash is your bottleneck, it might make sense to run those in parallel (on different machines).
PS: The biggest performance improvement generally is batching requests together, but Logstash does that by default.

ElasticSearch configuration for 3-node cluster

ElasticSearch is used as a cache for PostgreSQL database to avoid a lot of joins and speed up my application selects.
Initially everything is stored at single large server (32GB RAM): webapp, nginx, postgresql, celery, elasticsearch.
Now I have 2 additional smaller nodes which is not used at all (only for additional storage with nbd-server).
So I have:
- 1 Large node with ES. About 12-16GB of RAM is available for ES.
- 2 small nodes with 8 GB RAM. Everything is free for ES.
All 3 nodes have SSD and same CPU.
Later I will add more 8GB nodes (as storage + ES).
What will be the best way to built ES cluster on this 3 nodes? Should all of them be data/master nodes? Or it will be better to use large node as a master and 2 small as

Resources