Improving speed elasticsearch cluster - elasticsearch

i have setup 3 node clusters (24 core each).
when i run query that time it's giving slow result.
but i have created a stand alone server and ran same query that time giving fast result.
my cluster and data details:-
i have created 12 indexer and around 3 shards each.
total size of data 60gb.
elastic search version 5.3
server disk 500gb each.
so kindly guide me how to achieve better speed.
Thanks in advance.

Related

how to know elastic-search cluster capacity?

Recently I'm working in a project which requires figuring out elastic-search capacity as we will increase a lot msgs in es system per second.
We have 3 types of nodes in es cluster: master, data, client.
how do we know the maximum insert count per second our client can handle? do we need to care about the bandwidth of the client nodes?
as per the above comments, you need to benchmark your cluster hardware and settings with your proposed data structure using a tool like https://esrally.readthedocs.io/en/stable/

Indexing multiple indexes in elastic search at the same time

I am using logstash for ETL purpose and have 3 indexes in the Elastic search .Can I insert documents into my 3 indexes through 3 different logtash processes at the same time to improve the parallelization or should I insert documents into 1 index at a time.
My elastic search cluster configuration looks like:
3 data nodes
1 client node
3 data nodes - 64 GB RAM, SSD Disk
1 client node - 8 GB RAM
Shards - 20 Shards
Replica - 1
Thanks
As always it depends. The distribution concept of Elasticsearch is based on shards. Since the shards of an index live on different nodes, you are automatically spreading the load.
However, if Logstash is your bottleneck, you might gain performance from running multiple processes. Though if running multiple LS process on a single machine will make a positive impact is doubtful.
Short answer: Parallelising over 3 indexes won't make much sense, but if Logstash is your bottleneck, it might make sense to run those in parallel (on different machines).
PS: The biggest performance improvement generally is batching requests together, but Logstash does that by default.

How much data can my Hadoop cluster handle?

I have a 4 node cluster configured to have 1 Namenode and 3 datanodes. Im performing a TPCH benchmark and i would like to know how much data you think my cluster can handle without affecting query response times. My total available HD size is about 700GB, each node has cpu with 8 cores and 16GB of RAM.
I saw some calculations that we could do to find the volume limit but i didnt understand IT, if someone could explain on a simple way how to calculate the data volume that a cluster can handle it would be very helpful.
Thank you
You can use 70 to 80 % of space in ur cluster to store the data, remaining will be used for processing and to store intermediate results in ur cluster.
This way performance will not be impacted
As you mentioned, you already configured your 4 node cluster. You can go and check in NN webUI-->Configured capacity section to find out the storage details, Let me know if you find any difficulties.

is it possible to setup elasticsearch on ceph block storage

Is it possible to use ceph as storage backend for elasticsearch?
It seems to me that elasticsearch only supports disk writes. But I want to use my ceph cluster to be used as storage backend for the elasticsearch cluster
We have been using it at my company. We have 3 elasticsearch nodes with ceph as storage. It usually ingests about 20,000 records per seconds (5-6 millions per 5 minutes) with the loads being around 1.5 - 2.5 on all nodes.
You need to make sure you have very fast local network though.
My setup:
No of primary shards: 3
No of replica: 1
https://nayarweb.com/blog/2017/high-load-on-one-of-elasticsearch-node-on-ceph/

Performance degrades after adding solr nodes

I'm having an odd issue in where I set up a DSE 4.0 cluster with 1 Cassandra node and 1 Solr node (using DseSimpleSnitch) and performance is great. If I add additional nodes to have 3 Cassandra nodes and 3 Solr nodes, then the performance of my Solr queries goes downhill dramatically. Anyone have any idea what I might be doing wrong? I have basically all default options for DSE and have tried wiping all data and recreating everything from scratch several times with the same result. I've also tried creating the keyspace with replication factors of 1 and 2 with the same results.
Maybe my use case is a bit odd but I'm using Solr for OLTP type queries(via SolrJ with binary writers/readers) which is why the performance is critical. With a very light workload of say 5 clients making very simple Solr queries the response times go up about 50% from a single Solr node to 3 Solr nodes with only a few hundred small documents seeded for my test(~25ms to ~50ms). The response times get about 2 to 3 times slower with 150 clients against 3 nodes compared to a single node. The response times for Cassandra are unchanged, its only the Solr queries that get slower.
Could there be something with my configuration causing this?
Solr queries need to fan out to cover the full range of keys for the column family. So, when you go from one node to three nodes, it should be no surprise that the total query time would rise to three times a query that can be satisfied with a single node.
You haven't mentioned the RF for the Search DC.
For more complex queries, the fan out would give a net reduction in query latency since only a fraction of the total query time would occur on each node, while for a small query the overhead of the fanout and aggregation of query results dwarfs the time to do the actual Solr core query.
Generally, Cassandra queries tend to be much simpler than Solr queries, so they are rarely comparable.
Problem solved. After noticing the documentation mentioning not to use virtual nodes for Solr nodes (and not saying why) I checked my configuration and noticed I was using virtual nodes. I changed my configuration to not use virtual nodes and the performance issue disappeared. I also upgraded from 4.0.0. to 4.0.2 at the same time but I'm pretty sure it was the virtual nodes causing the problem.

Resources