Elasticsearch: sorting integer desc - sorting

When sorting by integer field in elasticsearch (version "1.1.2") using query:
{
"query": {
"match_all": {}
},
"sort": [
{
"cubicCapacity": {
"order": "asc",
"ignore_unmapped": true
}
}
],
"from": 0,
"size": 150
}
The result is correct and documents are sorted in natural order (1, 2, 5, 10)
But when trying to complete same query using "desc":
{
"query": {
"match_all": {}
},
"sort": [
{
"cubicCapacity": {
"order": "desc",
"ignore_unmapped": true
}
}
],
"from": 0,
"size": 150
}
The result is not correct and documents are sorted in some strange way, but expected to be (10, 5, 2, 1).
So why sorting with "desc" could not give a correct result with a natural order?
P.S. When sorting by asc/desc but with a string type (1, 10, 2, 5), "desc" however works correct (5, 2, 10, 1)

you should add leading zeros to your numbers, so you will have [0001, 0010, 0002, 0005] instead of [1, 10, 2, 5].
The number of leading zeros you have add to will depend of the max value you think you will have.
e.g. if you think it will be under 10 billions so you should store 0000000005 (9 zeros) instead of 5 and 0000000010 (8 zeros) instead of 10

Related

Elasticsearch sort by filtered value

I'm using Elasticsearch 7.12, upgrading to 7.17 soon.
The following description of my problem has had the confusing business logic for my exact scenario removed.
I have an integer field in my document named 'Points'. It will usually contain 5-10 values, but may contain more, probably not more than 100 values. Something like:
Document 1:
{
"Points": [3, 12, 34, 60, 1203, 70, 88]
}
Document 2:
{
"Points": [16, 820, 31, 60]
}
Document 3:
{
"Points": [93, 20, 55]
}
My search needs to return documents with values within a range, such as between 10 and 19 inclusive. That part is fine. However I need to sort the results by the values found in that range. From the example above, I might need to find values between 30-39, sorted by the value in that range ascending - it should return Document 2 (containing value of 31) followed by Document 1 (containing value of 34).
Due to the potential range of values and searches I can't break this field down into fields like 0-9, 10-19 etc. to search on them independently - there would be many thousands of fields.
The documents themselves are otherwise quite large and there are a large number of them, so I have been advised to avoid nested fields if possible.
Can I apply a filter to a sort? Do I need a script to achieve this?
Thanks.
There are several ways of doing this:
Histogram aggregation
Aggregate your documents using a histogram aggregation with "hard bounds". Example query
POST /my_index/_search?size=0
{
"query": {
"constant_score": { "filter": { "range": { "Points": { "gte": "30", "lte" : "40" } } } }
},
"aggs": {
"points": {
"histogram": {
"field": "Points",
"interval": 10,
"hard_bounds": {
"min": 30,
"max": 40
}
},
"aggs" : {"top" : {"top_hits" : {}}}
}
}
}
THis will aggregate all the documents as long as they fall in that range, and the first bucket in the results, will contain the document that you want.
Try this with an extended terms aggregation:
If the range you want is relatively small. For eg like you mentioned "30 - 39", a simple terms aggregation on the results with an inclusion for all the numbers in that range, will also give you the desired result.
Example Query:
POST /my_index/_search?size=0
{
"query": {
"constant_score": { "filter": { "range": { "Points": { "gte": "30", "lte" : "40" } } } }
},
"aggs": {
"points": {
"terms": {
"field": "Points",
"include" : ["30","31"....,"39"]
},
"aggs" : {"top": {"top_hits" : {}}}
}
}
}
Each bucket in the terms aggregation results will contain the documents that have that particular "Point" occurring at least once. The first document in the first bucket has what you want.
The third option involves building a runtime field, that will trim the points to contain only the points between your range, and then sorting in ascending order on that field. But that will be slower.
HTH.

What is equivalent of group by and collect as list in elastic search?

I have an elasticseach index
{class_id: 1, student_id: 10}
{class_id: 2, student_id: 20}
{class_id: 1, student_id: 30}
{class_id: 2, student_id: 40}
I want an aggregation such that
{class_id: 1, student_ids: [10,30]}
{class_id: 2, student_ids: [20,40]}
Not sure how to go about it
You simply need to use two terms aggregations, like this:
{
"size": 0,
"aggs": {
"classes": {
"terms": {
"field": "class_id"
},
"aggs": {
"students": {
"terms": {
"field": "student_id"
}
}
}
}
}
}
You'll get one bucket for each class and inside each class bucket, you'll get one bucket per student in that class.

elasticsearch range aggregation with fixed buckets values

I need to create an aggregation, like "range" but I need to specify the "where" clause of each bucket.
For example if we aggregate on an "age" field, what range agg offer is:
bucket 1: to 10
bucket 2: from 10 to 50
bucket 3: from 50
what I need is:
bucket 1: [5,4334,211 and 76]
bucket 2: [66 and 435]
bucket 3: [5455, 7968, 1, 443 and 765]
I don't want to create 3 "terms" aggregations with the "include" property, what I need is one aggregation with 3 buckets (just like range offers).
Any ideas or alternatives ?
Only the first bucket would cause an issue since the range is discontinued, but all the other ones can be specified easily with a from/to constraint in range buckets. I suggest something like this:
{
"aggs" : {
"age_ranges" : {
"range" : {
"field" : "age",
"ranges" : [
{ "from": 5, "to": 6 }, <--- only 5
{ "from": 10, "to": 13 }, <--- 10, 11, 12
{ "from": 13, "to": 15 }, <--- 13, 14
{ "from": 20, "to": 26 } <--- 20, 21, 22, 23, 24, 25
]
}
}
}
}
So, there is another way of having a similar result using terms aggregation with scripting.
"aggs": {
"age_ranges": {
"terms": {
"script": {
"inline": "if(ageMap.containsKey(doc['age'].value)){ageMap.get(doc['age'].value)} else {'<unmapped>'} ",
"params": {
"contentIdMapping": {
"1": "bucket-3",
"5": "bucket-1",
"66": "bucket-2",
"76": "bucket-1",
"211": "bucket-1",
"435": "bucket-2",
"443": "bucket-3",
"765": "bucket-3",
"4334": "bucket-1",
"5455": "bucket-3",
"7968": "bucket-3"
}
}
}
}
}
}

Range aggregation on multiple field

I have two fields in the search document such as salary_from and salary_to and want the aggregation of salary ranges such as 0 - 10 , 10 - 20, etc.
Is there any ways to set multiple fields to the Elastic Range Aggregation. (I can set one field by using setField function)
I just need to get the aggregated count of salary ranges or slabs by considering the two fields salary_from and salary_to.
Please help me.
If I understand your question correctly, below is what you need.
{
"size": 0,
"aggs": {
"salary_ranges": {
"terms": {
"script": "doc['salary_from'].value + ' to ' + doc['salary_to'].value",
"size": 0
}
}
}
}
It basically uses a script for Terms Aggregation. Read more about it here.
If say, you have 3 documents with salary_from set to 3 and salary_to set to 5 and then you have 2 documents with salary_from set to 10 and salary_to set to 25, the result of the query above will look something like below:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"salary_ranges": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "3 to 5",
"doc_count": 3
},
{
"key": "10 to 25",
"doc_count": 2
}
]
}
}
}

Ignore filter on range aggregation

Scenario: I'm searching for cars, and my first search happens without any filtering.
I have two range aggregations:
Price: *-5000, 5000-10000, 10000-*
Seats: 2-*, 4-*, 5-*, 7-*, 9-* (I want to see how many cars have at least, say, 5 seats)
I also and several aggregations on boolean terms, e.g.:
Aircon
Natural gas
If I apply the filtering for aircon everything's good:
"aggregations": {
"price_ranges" : {
"buckets": [
{
"to": 5000,
"doc_count": 2
},
{
"from": 5000,
"to": 10000,
"doc_count": 4
},
{
"from": 10000,
"doc_count": 4
}
]
}
}
But let's say that in the main query I apply a price filter, between 5000 and 10000. Now the other values are (correctly) zeroed, as they only apply to the docs retrieved.
"aggregations": {
"price_ranges" : {
"buckets": [
{
"to": 50,
"doc_count": 0
},
{
"from": 50,
"to": 100,
"doc_count": 4
},
{
"from": 100,
"doc_count": 0
}
]
}
}
Questions:
Is there a way to still have the totals as the first example (2, 4, 4) even if I'm filtering on price, without building/executing a second query?
Even making multiple queries, I'd have to make a distinct call with all filters except price to get all cars price ranges as in question 1, and a second one with all filters except seats (i.e. including price) to show those values as required. Is there any way around this?

Resources