Elasticsearch Aggregations. What are they? - elasticsearch

can someone please in simple sentences explain what is elasticsearch aggregation exactly?
I searched but everywhere there are some explanations about how to use it, and about syntax.
but I can't understand the reason why they exist. what is their main purpose.
what kind of query they've build for ?
I need trend tags (suggest terms in search as you type ) in my search system
I faced elastic aggs , and I have no Idea what they are.

If you know SQL, elasticsearch aggregations are kind of group by clause of elasticsearch.
You can aggregate(group by) on field you want, can have document count on that field, can also have all the documents in that group, can have nested aggs(group by).
For suggest terms in search as you type aggs will not work .. you need to read about analysis document ... or read about fuzzy query in elasticsearch.

Related

Performance of match vs term query in elasticsearch?

I've been using a lot of match queries in my project. Now, I have just faced with term query in Elasticsearch. It seems the term query is more faster in case that keyword of your query is specified.
Now I have a question there..
Should I refactor my codes (it's a lot) and use term instead of match?
How much is the performance of using term better than match?
using term in my query:
main_query["query"]["bool"]["must"].append({"term":{object[..]:object[...]}})
using match query in my query:
main_query["query"]["bool"]["must"].append({"match":{object[..]:object[...]}})
Elastic discourages to use term queries for text fields for obvious reasons (analysis!!), but if you know you need to query a keyword field (not analyzed!!), definitely go for term/terms queries instead of match, because the match query does a lot more things aside from analyzing the input and will eventually end up executing a term query anyway because it notices that the queried field is a keyword field.
As far as I know when you use the match query it means your field is mapped as "text" and you use an analyzer. With that, your indexed word will generate tokens and when you run the query you go through an analyzer and the correspondence will be made for each of them.
Term will do the exact match, that is, it does not go through any analyzer, it will look for the exact term in the inverted index.
Because of this I believe that by not going through analyzers, Term is faster.
I use Term match to search for keywords like categories, tag, things that don't make sense use an analyzer.

ElasticSearch: given a document and a query, what is the relevance score?

Once a query is executed on ElasticSearch, a relevance _score is calculated for each retrieved document.
Given a specific document (e.g. by doc ID) and a specific query, I would like to see what is its _score?
One way is perhaps to query ES, retrieve all the hit documents, and look up the desired document out of all the retrieved documents to see its score.
I assume there should be a more efficient way to do this. Given a query and a document ID, what is its _score?
I'm using ElasticSearch 7.x
PS: I need this for a learning-to-rank scenario (to create my judgment list). I have in fact a complex query that was created from various should and must over different fields. My major requirement was to get the score value for each individual sub-query, which seems there is no solution for it. I want to understand which part of this complex query is more useful and which one is less. The only way I've come up with is to execute each sub-query separately to get the score but I do not want to actually execute that query just asking for what is the score of a specific document for that sub-query.
Scoring of the document is not only related to just the document and all other documents in the index, but it also depends on various factor like:
_score is calculated per shard basis not on an index basis by default, although you can change this behavior by using DFS Query Then Fetch param in your query. More info on this official blog.
Is there is any boost applied at index or query time(index time is deprecated from 5.X).
Any custom scoring function is used in addition to the default ES scoring algorithm(tf/idf in old versions) and BM25 in the latest versions.
Edit: Based on the comments from the other respected community members, rephrasing the below statement:
To answer your question, Using the _explain API, you can understand how Elasticsearch computes a score explanation for a query and a specific document. This can give useful feedback on whether a document matches or didn’t match a specific query.

how can I find related keywords with elasticsearch?

I am pretty new to elasticsearch and already love it.
Right know I am interested in understanding on how I can let elasticsearch make suggestions for similar keywords.
I have already read this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html.
The More Like This Query (MLT Query) finds documents that are "like" a given set of documents.
This is already more than I am looking for. I dont need similar documents but only related / similar keywords.
So lets say I have an index of documents about movies and I start a query about "godfather". Then elasticsearch should suggest related keywords - e.g. "al pacino" or "Marlon Brando" because they are likely to occur in the same documents.
any ideas how this can be done?
Unfortunately, there is no built-in way to do that in Elastic. What you could possibly do, is to write a program, that will query Elastic, return matched documents, then you will get the _source data, or just retrieve it from your original datasource (like DB or file), later you will need to calculate TF-IDF for each term in the retrieved ones and somehow combine everything all together to get top K terms out of all returned terms.

How to use lucene analyzers with Elasticsearch java API

I want to build elastisearch queries using JAVA API. I want to know how to can use Lucene analyzers in elasticsearch java programs. I have checked QueryBuilders and tried to use analyzers directly as below.
QueryBuilder builder = QueryBuilders.matchQuery(searchString, fields).analyzer("porterstem");
But, it turned out to be wrong. If any one tried it, could you please give me some information?
You should define your analyzer in mapping.
So the analyzer will be used at index time and at query time.
ANALYZERS are used to analyze the documents that your are indexed. Analysis means it Ll split,the text in to tokens, normalize it, and also Lower case your indexed doc text. This analysis process Ll b more helpful while you search and searching will be faster..
You can mention analyzer while you query . But analyze the stored documents during query time. Ll b expensive. So analyze the document during indexing time. ES will analysis the doc during indexed and query time will b less and faster result.
So mention analyzers in mapping and searching efficiently..
For more information about analyzer refer
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/Analyzer.html

does kibana support max in queries?

I am hoping to find some information on the syntax of kibana queries. I want to be able to have a query that returns the max value of a field. Is this possible I have seen some stuff on facets but not sure if it apply's?
I know that max is an option for the histogram but i would like to use it elsewhere.
Since Kibana queries use the Lucene query syntax or RegEx, currently its queries seem to return matched records only (no aggregation).
I believe that aggregation (Max, for example) is only possible in Kibana Panels such as the Histogram.

Resources