composite aggregation vs nested terms aggregation

composite aggregation vs nested terms aggregation - elasticsearch

Hi I am currently using nested terms aggregations (triple or more) to query elasticsearch. I would rather use the composite aggregation with 3+ source fields that i just discovered since it is way more manageable in my opinion, but I was wondering if performance-wise this is a bad choice. Any recommendation ?

Related

How does Elasticsearch/Lucene achieve such performance when querying multiple fields?

According to the answer given here, Elasticsearch doesn't seem to use compound indexes for querying multiple fields, and instead queries multiple indexes and then intersects the results.
My question is how does it achieve such high performance? Surely a composite index is faster since it leads you straight to the desired data, rather than querying multiple indexes, which in turn return more data, and then compare the results?
I get the advantages of the multiple indexes, regarding the field order, etc., but in terms of performance, surely it's inferior...

Hold Elasticsearch document frequency constant as index changes

I'm using Elasticsearch to retrieve XML documents by terms. I have multiple indexes, one for each day. I have a large collection of documents that is, in some sense, representative. The document frequency of several terms varies from day to day.
The mathching I'm doing depends on inverse document frequency of terms. I'd like to not use the IDF of the indices I'm searching, and instead use the IDF based on the large, representative set. Is there a straightforward way to do this without writing custom scoring functions for large, complex queries?

There is no other way.
FWIW , To access and use IDF , you need to write a custom script Engine in elasticsearch, and probably use that engine based script for sorting.

elasticsearch parent/children aggregation performance

I am just an elasticsearch newbie. According to the following elasticsearch document,
join datatype
The join field shouldn’t be used like joins in a relation database. In Elasticsearch the key to good performance is to de-normalize your data into documents. Each join field, has_child or has_parent query adds a significant tax to your query performance.
has_child query
Note that the has_child is a slow query compared to other queries in the query dsl due to the fact that it performs a join.
has_parent query
Note that the has_parent is a slow query compared to other queries in the query dsl due to the fact that it performs a join.
I can understand these query types are slow and should be avoided. But what about parent and children aggregations ? I can not find any document or performance test result which says these aggregations are slow or not so bad.
I have to test it though, can someone give me some advice ?

Parent and Child Aggregations are definitely slower compared to other Aggregations. I have tested it in my applications and found it much slower than normal ones.

Performance wise, should we favor nested documents or joining over multiple collections

Performance wise, what is the best option - use nested documents ("_childDocuments") or joins over multiple collections?

Does solr support the sorting while creating index?

In my test environment, there are nearly 130,000,000 documents on each server. It works fast if I do a search without sorting by date, but extremly slow if sorting is enabled.
I think if the solr can sort an indexed field while creating index, searching would be more efficient. So, how to configure the solr to sort some fields while indexing?

The initial query would be slower but all the subsequent queries should be fast.
Solr should be able to use the Filter Query Cache for sorting.
You can also warm the sort fields.
Also check if the overhead is also just cause of sorting and there is no querying and scoring involved.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

composite aggregation vs nested terms aggregation - elasticsearch

Related

How does Elasticsearch/Lucene achieve such performance when querying multiple fields?

Hold Elasticsearch document frequency constant as index changes

elasticsearch parent/children aggregation performance

Performance wise, should we favor nested documents or joining over multiple collections

Does solr support the sorting while creating index?

Categories

Resources