How to use multiple query strings with aggregation in elasticsearch - elasticsearch

How to use multiple query strings with aggregate functions in elasticsearch?
For example:
if a>0 AND a<1, then {"low":count(aggregate count of records within 0 to 1)}
else if a > 1 AND a < 100, then {"normal":count(aggregate count of records within 1 to 100)}
else {"high":count(aggregate count of records after 100)}
How to achieve this using Request Body Query string?
Thank you in advance.

Assuming that a is a field that you search on, I think the easiest way for you to do that is using the range aggregation with buckets for each of your use-cases (low, normal, high).
You cannot bind aggregations to conditions of your query. That you would have to do in code yourself. But if you use the range aggregation, you could define your buckets like
POST /_search
{
"aggs" : {
"a_ranges" : {
"range" : {
"field" : "a",
"ranges" : [
{ "to" : 1 },
{ "from" : 1, "to" : 10 },
{ "from" : 10 }
]
}
}
}
}
Depending on your query, two of these buckets would remain empty, but this should give you the result you want

Related

Elasticsearch "size" value not working in terms aggregation with partitions

I am trying to paginate over a specific field using the terms aggregation with partitions.
The problem is that the number of returned terms for each partition is not equal to the size parameter that I set.
These are the steps that I am doing:
Retrieve the number of different unique values for the field with "cardinality" aggregation.
In my data, the result is 21.
From the web page, the user wants to display a table with 10 items per page.
if unique_values % page_size != 0:
partitions_number = (unique_values // page_size) + 1
else:
partitions_number = (unique_values // page_size)
Than I am making this simple query:
POST my_index/_search?pretty
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"match": {
"field_to_paginate": "foo"
}
}
]
}
},
"aggs": {
"by_pchostname": {
"terms": {
"size": 10,
"field": "field_to_paginate",
"include": {
"partition": 0,
"num_partitions": 3
}
}
}
}
}
I am expecting to retrieve 10 results. But if I run the query I have only 7 results.
What am I missing here? Do I need to use a different solution here?
As a side note, I can't use composite aggregation because I need to sort results by doc_count over the whole dataset.
Partitons in terms aggregation divide the values in equal chunks.
In your case no of partition num_partitions is 3 so 21/3 == 7.
Partitons are meant for getting large values in the order of 1000 s.
You may be able to leverage shard_size parameter. My suggestion is to read this part of manual and work with the shard_size param
Terms aggregation does not allow pagination. Use composite aggregation instead (requires ES >= 6.1.0). Below is the quote from reference docs:
If you want to retrieve all terms or all combinations of terms in a
nested terms aggregation you should use the Composite aggregation
which allows to paginate over all possible terms rather than setting a
size greater than the cardinality of the field in the terms
aggregation. The terms aggregation is meant to return the top terms
and does not allow pagination.

How to perform a distinct count query in Elasticsearch

I have an index with a host field. I am trying to retrieve the count of documents by distinct host name.
IE:
Host1:
Count: 72
Host2:
Count: 33
Host3:
Count: 153
Each document has a host field and it is a string. I assume I need to do something involving terms and cardinality, but I can't quite nail the syntax.
How to get all possible values for field host?
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever_you_like_here" : {
"terms" : { "field" : "host", "size":10000 }
}
},
"size" : 0
}'
Note
The result will contain a doc_count for each unique value
"size":10000 Get at most 10000 unique values. Default is 10.
"size":0 By default, "hits" contains 10 documents. We don't need them.
By default, the buckets are ordered by the doc_count in decreasing order.
Reference: bucket terms aggregation

How to normalize ElasticSearch scores?

For my project I need to find out which results of the searches are considered "good" matches. Currently, the scores vary wildly depending on the query, hence the need to normalize them somehow. Normalizing the scores would allow to select the results above a given threshold.
I found couple solutions for Lucene:
how do I normalise a solr/lucene score?
http://wiki.apache.org/lucene-java/ScoresAsPercentages
How would I go ahead and apply the same technique to ElasticSearch? Or perhaps there is already a solution that works with ES for score normalization?
As far as I searched, there is no way to get a normalized score out of elastic. You will have to hack it by making two queries. First will be a pilot query (preferably with size 1, but rest all attributes same) and it will fetch you the max_score. Then you can shoot your actual query and use functional_score to normalize the score. Pass the max_score you got as part of the pilot query in params to function_score and use it to normalize every score. Refer: This article snippet
It's a bit late.
We needed to normalise the ES score for one of our use cases. So, we wrote a plugin that overrides the ES Rescorer feature.
Supports min-max and z score.
Github: https://github.com/bkatwal/elasticsearch-score-normalizer
Usage:
Min-max
{
"query": {
... some query
},
"from" : 0,
"size" : 50,
"rescore" : {
"score_normalizer" : {
"normalizer_type" : "min_max",
"min_score" : 1,
"max_score" : 10
}
}
}
Usage z-score:
"query": {
... some query
},
"from" : 0,
"size" : 50,
"rescore" : {
"score_normalizer" : {
"normalizer_type" : "z_score",
"min_score" : 1,
"factor" : 0.6,
"factor_mode" : "increase_by_percent"
}
}
}
For complete documentation check the Github repository.

Is it possible to sort buckets in Terms aggregation response on a non-term field?

I need to sort the buckets in result of a ElasticSearch Terms aggregation.
Below is the one of the indexed records in ElasticSearch
{"personId":"10","Salary":10000, "Age":20, "personName":"xyz"}
I am using Terms aggregation over the field Salary. Below is the Terms aggregated ElasticSearch query:
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "Salary"
}
}
}
}
This query returns the buckets on the basis of Salary values. These buckets can be sort over the Salary value using order below query:
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "gender",
"order" : { "_term" : "asc" }
}
}
}
}
But I need to sort buckets on any the field Age (non terms field), is there any way to do it ?
The whole point of aggregations is to "dispatch" the documents into buckets, each of which is defined by the declared field of the terms aggregation, in your case Salary.
The buckets you get in the response are not documents anymore. For instance, in the bucket 10000, you'll get the count of documents which have Salary: 10000, and you'll have as many buckets as different Salary values there are in all your documents (by default only 10 buckets, though).
So, since buckets are not documents, and since a bucket can aggregate documents with different Age values, it's not clear how you'd like the Salary buckets to be sorted by Age.
Maybe, one way out of this could be to add a terms sub-aggregation on the Age field, so you get top Salary buckets and below that you get Age buckets. Then you can sort your Salary/Age bucket pairs any way you want.

Elastic Search Query for multiple conditions

I want to build a query in Elastic Search which has 3 sub conditions.
1. It must satisfy at-least one of list of provided values.
2. After 1, 2 must be satisfied and then 3rd condition.
(1 must be satisfied, 2 and 3 also must be satisfied but only after 1 is satisfied).
1 is a list of values, so anyone satisfying will suffice.
Please give a outline of how to frame the Elastic Search query using boolean parameters.
Thanks in advance.
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" :[{"term":{"sessionId":"-ShAwL2KlnVeo6nMMNX3ycVlc0kdikOWPC8vShyvpRpdmOQJkbBo-FiLJymsuZp36gcQs1I"}}],
"should" : [
{ "term" : {"visitorId": "b090606f-968d-fef4-33e3-3341f3a04265"}},
{ "term" : {"clientIp": "192.168.8.100"}}
]
}
}
}
}
}
the terms specified in the must, the documents must match the criteria
the terms specified in the should, any of the term can be matched

Resources