Elasticsearch: how to know which field the results are sorted by? - elasticsearch

In Elasticsearch, is there any way to check which field the results are sorted by? I want something like inner-hits for sort clause.
Imagine that your documents have this kind of form:
{"numerals" : [ // nested
{"key": "point", "value": 30},
{"key": "points", "value": 200},
{"key": "score", "value": 20},
{"key": "scores", "value": 40}
]
}
and you sort the results by:
{"numerals.value": {
"nested_path": "numerals",
"nested_filter": {
"match": {
"numerals.key": "score"}}}}
Now I have no idea how to know the field by which the results are actually sorted: it's probably scores at this document, but is perhaps score at the others? There are 2 problems - 1. You cannot use inner-hits nor highlight for the nested fields. and - 2. Even if you can, it doesn't solve the issue if there are multiple matching candidates.

The question is about sorting by fields that are inside nested objects.
So this is what the documention
https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-sorting.html
and
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#_nested_sorting_example
says:
Elasticsearch will first restrict the nested documents by the "nested_filter"-query and then sort on the same way as for multi-valued fields:
Exactly the way as if there would be only the filtered nested documents as inner objects aka as if there would be only the root document with a multi-valued field which contains exactly all value which belong to the filtered nested objects
( in your example there will only one value remain: 20).
If you want to be sure about the sort order insert a "mode" parameter:
"min", "max", "sum", "avg" or "median"
If you do not specify the "mode" parameter according to the corresponding issue the min-value will be picked for "asc" and the max-value will be picked for "desc"-order:
By default when sorting on a multi-valued field the lowest or highest
value will be picked from the field values depending on the sort
order.

Related

Elasticsearch 5 : sort by price of the closest wholesaler

I have a product nested document containing a list of prices associated to different wholesalers.
Here is a document example :
{
"sku": "065879",
"name": "My product",
"price": [
{
"wholesaler": "1",
"location": "drm3btev3",
"price": "12.34"
},
{
"wholesaler": "2",
"location": "gbsuv7ztq",
"price": "45.67"
},
]
}
Given a customer's geo point, what is the correct query to get a list of documents sorted by price, using only the closest price for each document ?
Thanks by advance !
It's not a real answer but the global approach is to use a nested sort. Nested sort will allow you to filter the nested document on which you want to apply your sorting.
Then you should in the nested sort filter add a script query that will determine the closest wholesaler. The problem is that you cant work with geohash in painless. But if you convert your geohash to geopoint data type in, you will be able to use script distance features ( example here )
Then you could compute the minimal distance by iterating on all nested document and only match the one with the minimal distance.
But I have no idea of the performance impact and detailed implementation.
Good luck !

Elasticsearch performance impact on choosing mapping structure for index

I am receiving data in a format like,
{
name:"index_name",
status: "good",
datapoints: [{
paramType: "ABC",
batch: [{
time:"timestamp1<epoch in sec>",
value: "123"
},{
time:"timestamp2<epoch in sec>",
value: "123"
}]
},
{
paramType: "XYZ",
batch: [{
time:"timestamp1<epoch in sec>",
value: "123"
},{
time:"timestamp2<epoch in sec>",
value: "124"
}]
}]
}
I would like to store the data into elasticsearch in such a way that I can query based on a timerange, status or paramType.
As mentioned here, I can define datapoints or batch as a nested data type which will allow to index object inside the array.
Another way, I can possibly think is by dividing the structure into separate documents. e.g.
{
name : "index_name",
status: "good",
paramType:"ABC",
time:"timestamp<epoch in sec>",
value: "123"
}
which one will be the most efficient way?
if I choose the 2nd way, I know there may be ~1000 elements in the batch array and 10-15 paramsType array, which means ~15k documents will be generated and 15k*5 fields (= 75K) key values pair will be repeated in the index?
Here this explains about the advantage and disadvantage of using nested but no performance related stats provided. in my case, there won't be any update in the inner object. So not sure which one will be better. Also, I have two nested objects so I would like to know how can I query if I use nested for getting data between a timerange?
Flat structure will perform better than nested. Nested queries are slower compared to term queries ; Also while indexing - internally a single nested document is represented as bunch of documents ; just that they are indexed in same block .
As long as your requirements are met - second option works better.

Elasicsearch sort by inner field

I have documents that one of their field looks like the following -
"ingredients": [{
"unit": "MG",
"value": 123,
"key": "abc"
}]
And I would like to sort the different records according to the ascending value of specific ingredient. That is if I have 2 records which have use ingredient with key "abc", one with value 1 and one with value 2. The one with ingredient value 1 should appear first.
Each of those records may have more than on ingredient.
Thank you in advance!
The search query to sort will be:
{
"sort":{
"ingredients.value":{
"order":"asc"}
}}

Terms aggregation on first three octets of IP

I'm doing a faceted search UI, and one of the facets I want to add is for the first three octets of an IP field.
So for example, given documents with IPs "192.168.1.1", "192.168.1.2", "192.168.2.1", I would want to display the facets "192.168.1 (2)" and "192.168.2 (1)".
Is there an aggregation I can use for this? As far as I can tell, range aggregations require me to predefine the ranges, and term aggregations only take a field.
Obviously the alternative is for me to index the first three octets as a separate field, but of course I would prefer to avoid that.
Thanks!
You can add a path hierarchy tokenizer with delimeter of '.' and a custom analyzer with the tokenizer set to the tokenizer you just made.
See this question for the syntax:
Elasticsearch - using the path hierarchy tokenizer to access different level of categories
Then you can aggregate terms and you will get results grouped by each number group
{
"key": "192",
"doc_count": 10
},
{
"key": "192.168",
"doc_count": 10
},
...
In the linked answer there is a way to exclude certain aggregations levels. The following should exclude all results except ones that have 3 levels of numbers.
"aggs": {
"ipaddr": {
"terms": {
"field": "your_ip_addr",
"exclude": ".*",
"include": ".*\\..*\\..*"
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pathhierarchy-tokenizer.html

Grouping non null fields together in Kibana

Given the following three User entries in an ElasticSearch index:
"user": [
{
"userId": "100",
"hobby": "chess"
}
"user": [
{
"userId": "200",
"hobby": "music"
}
"user": [
{
"userId": "300",
"hobby": ""
}
I want to create a vertical bar chart to compare the number of users who have a hobby as opposed to those who do not. Individual hobbies should not be shown separately, but grouped together.
If split along the Y axis, one block would take up two thirds of the height (the two users with hobbies) and one block one third of the height (the one user with no hobbies).
How could one achieve this grouping in Kibana?
Thanks
You'll need to choose Split Bars and then Filters aggregation. Once you have that selected you should see Query 1 with * in it. Change the * to hobby:*. Next hit Add Filter and put in NOT hobby:*
The filters aggregation lets you bucket things pretty much any way you can search for things.

Resources