Sorting on nested fields containing null values in elasticsearch - elasticsearch

I am trying to sort on nested field in elasticsearch, but it is always showing docs with null valued nested field at the top of the sorted list while sorting in ascending order. I want to sort (in ascending as well as descending order) and want the null valued nested field docs to appear at the end of the sorted list.
This is the sorting query I am using :
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "string",
"order": "asc",
"script": {
"lang": "painless",
"source": "def val=params['_source'].tags; if(val==null){return '';}else{return params['_source'].tags} "
}
}
}
}
Below is the mapping I have applied related to the nested 'tags' field:
"tags": {
"type": "nested",
"properties": {
"tag": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
}
}
}
Sample payload:
"tags": [
{
"tag": "check"
},
{
"tag": "production"
},
{
"tag": "test"
}
]

Since tags is a nested property, you need to apply a nested sort on it, and to make document without tags last in the results list you only add to use the missing property.
nb: We sort in the subfield keyword of the tags since sorting key need to be not analyzed.
Here the example query :
{
"query": {
"match_all": {}
},
"sort": [
{
"tags.tag.keyword": {
"order": "asc",
"missing": "_last",
"nested": {
"path": "tags"
}
}
}
]
}
Since you are sorting on multi values text field I recommend you to check this sort documentation section about sorting modes for a better understanding of elasticsearch behavior in such case.
Hope it helps!

Related

Number of nested objects in Elasticsearch

Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:
PUT my-index-000001
{
"mappings": {
"properties": {
"some_id": {"type": "long"},
"user": {
"type": "nested",
"properties": {
"first": {
"type": "keyword"
},
"last": {
"type": "keyword"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"some_id": 111,
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
How to filter by the number of users (e.g. query fetching all documents with more than XX users).
I was thinking to using a runtime_field but this gives an error:
GET my-index-000001/_search
{
"runtime_mappings": {
"num": {
"type": "long",
"script": {
"source": "emit(doc['some_id'].value)"
}
},
"num1": {
"type": "long",
"script": {
"source": "emit(doc['user'].size())" // <- this breaks with "No field found for [user] in mapping"
}
}
}
,"fields": [
"num","num1"
]
}
Is it possible perhaps using aggregations?
Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).
Thanks.
You cannot query this efficiently
It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source and is therefore really slow when you have a lot of docs
{
"query": {
"function_score": {
"min_score": 1, # -> min number of nested docs to filter by
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "params._source['user'].size()"
}
}
],
"boost_mode": "replace"
}
}
}
It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score from returning
The best way to do this is to add a userCount field at indexing time (since you know how many elements there are) and then query that field using a range query. Very simple, efficient and fast.
Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.
If you cannot re-create your index, you can leverage the _update_by_query endpoint in order to add that field:
POST my-index-000001/_update_by_query?wait_for_completion=false
{
"script": {
"source": """
ctx._source.userCount = ctx._source.user.size()
"""
}
}

nested terms aggregation on object containing a string field

I like to run a nested terms aggregation on string field which is inside an object.
Usually, I use this query
"terms": {
"field": "fieldname.keyword"
}
to enable fielddata
But I am unable to do that for a nested document like this
{
"nested": {
"path": "objectField"
},
"aggs": {
"allmyaggs": {
"terms": {
"field": "objectField.fieldName.keyword"
}
}
}
}
The above query is just returning an empty buckets array
Is there a way this can be done without enabling field-data by default during index mapping.
Since that will take a large heap memory and I have already loaded a huge data without it
document mapping
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text"
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
sample document
{
"productname": [
{
"productlineseqno": "1.58",
"iscontrolleddrug": "false",
"productlinename": "Consultations",
"productlinedescription": "Consultations",
"isprescribable": "false",
"invoiceitemname": "invoice name"
}
]
}
Fixed
By changing the mapping to enable field data
Nested query is used to access nested fields similarly nested aggregation is needed to aggregation on nested fields
{
"aggs": {
"fieldname": {
"nested": {
"path": "objectField"
},
"aggs": {
"fields": {
"terms": {
"field": "objectField.fieldname.keyword",
"size": 10
}
}
}
}
}
}
EDIT1:
If you are searching for productname.invoiceitemname.keyword then it will give empty bucket as no field exists with that name.
You need to define your mapping like below
{
"mappings": {
"properties": {
"productname": {
"type": "nested",
"properties": {
"productlineseqno": {
"type": "text"
},
"invoiceitemname": {
"type": "text",
"fields":{ --> note
"keyword":{
"type":"keyword"
}
}
},
"productlinename": {
"type": "text"
},
"productlinedescription": {
"type": "text"
},
"isprescribable": {
"type": "boolean"
},
"iscontrolleddrug": {
"type": "boolean"
}
}
}
}
}
}
Fields
It is often useful to index the same field in different ways for
different purposes. This is the purpose of multi-fields. For instance,
a string field could be mapped as a text field for full-text search,
and as a keyword field for sorting or aggregations:
When mapping is not explicitly provided, keyword fields are created by default. If you are creating your own mapping(which you need to do for nested type), you need to provide keyword fields in mapping, wherever you intend to use them

ElasticSearch sort by value

I have ElasticSearch 5 and I would like to do sorting based on field value. Imagine having document with category e.g. genre which could have values like sci-fi, drama, comedy and while doing search I would like to order values so that first comes comedies then sci-fi and drama at last. Then of course I will order within groups by other criteria. Could somebody point me to how do this ?
Elasticsearch Sort Using Manual Ordering
This is possible in elasticsearch where you can assign order based on particular values of a field.
I've implemented what you are looking for using script based sorting which makes use of painless script. You can refer to the links I've mentioned to know more on these for below query would suffice what you are looking for.
Also assuming you have genre and movie as keyword with the below mapping.
PUT sampleindex
{
"mappings": {
"_doc": {
"properties": {
"genre": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"movie": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
You can make use of below query to get what you are looking for
GET sampleindex/_search
{
"query": {
"match_all": {}
},
"sort": [{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"inline": "if(params.scores.containsKey(doc['genre.raw'].value)) { return params.scores[doc['genre.raw'].value];} return 100000;",
"params": {
"scores": {
"comedy": 0,
"sci-fi": 1,
"drama": 2
}
}
},
"order": "asc"
}
},
{ "movie.raw": { "order": "asc"}
}]
}
Note that I've included sort for both genre and movie. Basically sort would happen on genre and post that further it would sort based on movie for each genre.
Hope it helps.

How to sort ordinal values in elasticsearch?

Say i've got a field 'spicey' with possible values 'hot', 'hotter', 'smoking'.
There's an intrinsic ordening in these values: they're ordinals.
I'd like to be able to sort or filter on them using their intrinsic order. For example: give me all documents where spicey > hot.
Sure i can translate the values to integers 0,1,2 but this requires extra housekeeping on both the index and the query side which I'd rather avoid.
Is this possible in some way? Already contemplated using multi field mapping but not sure if that would help me.
You can sort based on string values by scripting a sort operation, so that you set each spicey string a specific field value.
curl -XGET 'http://localhost:9200/yourindex/yourtype/_search' -d
{
"sort": {
"_script": {
"script": "factor.get(doc[\"spicey\"].value)",
"type": "number",
"params": {
"factor": {
"hot": 0,
"hotter": 1,
"smoking": 2
}
},
"order": "asc"
}
}
}
One solution could be to create a specific analyzer for spice levels. The idea is to map each level to a discrete value which increases the more spicy the spice is.
{
"settings": {
"analysis": {
"char_filter": {
"spices": {
"type": "mapping",
"mappings": [
"mild=>1",
"hot=>2",
"hotter=>3",
"smoking=>4"
]
}
},
"analyzer": {
"spice_synonyms": {
"type": "custom",
"char_filter": "spices",
"tokenizer": "standard",
"filter": [
"standard"
]
}
}
}
},
"mappings": {
"ordinal": {
"properties": {
"spicy": {
"type": "string",
"fields": {
"level": {
"type": "string",
"analyzer": "spice_synonyms"
}
}
}
}
}
}
}
In the above index settings and mappings, the spicy field would contain the plain english word (hot, mild, etc) while the spicy.level field would contain a discrete value that you can then use in queries and sorting.
For instance, retrieving documents whose spice level is strictly bigger than hot and ordered in decreasing order (smoking first) could be done like this:
{
"sort": {
"spicy.level": "desc"
},
"query": {
"query_string": {
"query": "spicy.level:>2"
}
}
}
or a range query would work, too
{
"sort": {
"spicy.level": "desc"
},
"query": {
"range": {
"spicy.level" {
"gt": 2
}
}
}
}

Elasticsearch getting the last nested or most recent nested element

We have this mapping:
{
"product_achievement": {
"type": "nested",
"properties": {
"id": {
"type": "long"
},
"last_purchase": {
"type": "long"
},
"products": {
"type": "long"
}
}
}
}
As you see this is nested, and the last_purchase field is a unixtimestamp value. We would like to query from all nested elements the most recent entry defined by the last_purchase field AND see if in the last entry there is some product id is in products.
You can achieve this using a nested query with inner_hits. In the query part, you can specify the product id you want to match and then using inner_hits you can sort by decreasing last_purchase timestamp and only take the first one using size: 1
{
"query": {
"nested": {
"path": "product_achievement",
"query": {
"term": {
"product_achievement.products": 1
}
},
"inner_hits": {
"size": 1,
"sort": {
"product_achievement.last_purchase": "desc"
}
}
}
}
}

Resources