elasticsearch sort by document id - elasticsearch

I have a simple index in elasticsearch and all my ids are manually added, i.e. I do not add documents with automatic string ids.
Now the requirement is to get list of all documents page by page and sorted by the document id (i.e. _id)
When I tried this with _id, it did not work. Then I looked for it on forums and found out this much that I have to use _uid for that. This actually works, although I have no clue how. But another problem is that the sorting is done as if the the _id is string. And it actually is a string. But I want the results as if the _id was a number.
So there are two issues here:
Why sorting does not work with _id and it does work with _uid
Is there a way to get document ids sorted as numbers and not integers
For e.g. if my doc ids are 1, 2, 3, ..... , 55
I am getting results in this order:
1, 10, 11, 12, ... , 19, 2, 20, ... so on
While I would like to get the results in this order:
1, 2, 3, ... so on
Any help is highly appreciated!

Have the _id indexed:
{
"mappings": {
"some_type": {
"_id": {
"index": "not_analyzed"
}
}
}
}
And use a script:
{
"sort": {
"_script": {
"type": "number",
"script": "doc['_id'].value?.isInteger()?doc['_id'].value.toFloat():null",
"order": "asc"
}
}
}
Even though I strongly recommend, if possible, changing the id to integer rather having it as string and contain numbers, instead.
And I kind of doubt that it worked with _uid because _uid is a combination between type and id.

For some reasons the code above didn't work for me. ("dynamic method [java.lang.String, isInteger/0] not found")
However the script below works (only if your _id can be converted into integers)
GET ENDPOINT/INDEX/_search
{
"sort": {
"_script": {
"type": "number",
"script": "return Integer.parseInt(doc['_id'].value)",
"order": "desc" // I personally needed descending
}
}
}

Instead of id, I used id.keyword and it worked.. sample code below:
GET index_name/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"id.keyword": {
"order": "desc"
}
}
]
}

Related

How are the documents ordered in Elasticsearch if the sort value for two documents is same?

I was working with products data, here: link
The search query that sort by keyword field tags using max mode is as follows.
GET product/_doc/_search
{
"size":100,"from":20,"_source":["tags", "name"],
"query": {
"match_all": {}
},
"sort": [
{"tags":{
"order":"desc",
"mode":"max"
}}
]
}
Some documents have same sort value. I had read somewhere that if the sort value is same, it arranges by internal doc id (_id). However, the case does not seem so. See screenshot below:
First _id: 961 followed by _id:972 (fine). However, then came _id: 114. I am not understanding how it got random.
Help will be appreciated.
As you have already seen, its random. To overcome this you can add another field to be used to sort when the sorting value for first field is same. As you want to use _id the query will be then as follows:
{
"size": 100,
"from": 20,
"_source": [
"tags",
"name"
],
"query": {
"match_all": {}
},
"sort": [
{
"tags": {
"order": "desc",
"mode": "max"
}
},
{
"_id": "asc"
}
]
}

Sort documents by size of a field

I have documents like below indexed,
1.
{
"name": "Gilly",
"hobbyName" : "coin collection",
"countries": ["US","France","Georgia"]
}
2.
{
"name": "Billy",
"hobbyName":"coin collection",
"countries":["UK","Ghana","China","France"]
}
Now I need to sort these documents based on the array length of the field "countries", such that the result after the sorting would be of the order document2,document1. How can I achieve this using elasticsearch?
You can use script based sorting to achieve this.
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "number",
"script": "doc['countries'].values.size()",
"order": "desc"
}
}
}
I would suggest using token count type in Elasticsearch.
By using scripts , it can be done (can check here for how to do it using scripts). But then results wont be perfect.
Scripts mostly uses filed data cache and duplicate are removed in this.
You can read more on how to use token count type here.

Elasticsearch autocomplete integer field

I am trying to implement an autocomplete feature on a numeric field (it's actual type in ES is long).
I am using a jQuery UI Autocomplete widget on the client side, having it's source function send a query to Elasticsearch with the prefix term to get a number (say, 5) of autocomplete options.
The query I am using is something like the following:
{
"size": 0,
"query": {
"prefix": {
"myField": "<term>"
}
},
"aggs": {
"myAggregation": {
"terms": {
"field": "myField",
"size": 5
}
}
}
}
Such that if myField has the distinct values: [1, 15, 151, 21, 22], and term is 1, then I'd expect to get from ES the buckets with keys [1, 15, 151].
The problem is this does not seem to work with numeric fields. For the above example, I am getting a single bucket with the key 1, and if term is 15 I am getting a single bucket with key 15, i.e. it only returns exact matches. In contrast, it works perfectly for string typed fields.
I am guessing I need some special mapping for myField, but I'd prefer to have the mapping as general as possible, while having the autocomplete working with minimal changes to the mapping (just to note - the index I am querying might be a general one, external to my application, so I will be able to change the type/field mappings in it only if the new mapping is something general and standard).
What are my options here?
What I would do is to create a string sub-field into your integer field, like this:
{
"myField": {
"type": "integer",
"fields": {
"to_string": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Then your query would need to be changed to the one below, i.e. query on the string field, but retrieve the terms aggregations from the integer field
{
"size": 0,
"query": {
"prefix": {
"myField.to_string": "1"
}
},
"aggs": {
"myAggregation": {
"terms": {
"field": "myField",
"size": 5
}
}
}
}
Note that you can also create a completely independent field, not necessary a sub-field, the key point is that one field needs the integer value to run the terms aggregation on and the other field needs the string value to run the prefix query on.

Sort based on string value in elastic search

I have field called "rating" in data. The value of this field,would be one of the following "good","average" or "bad". What Im trying to get is to sort the documents according to the "rating" field values they posses. Since the field value is a string how can i do a sort based on that value?
As far as I understand,you want the values in the "rating" field to be given weights and then sort them in the descending order.
You can use the following script for that:
"sort": {
"_script": {
"script": "factor.get(doc[\"rating\"].value)",
"type": "number",
"params": {
"factor": {
"good": 2,
"average": 1,
"bad": 0
}
},
"order": "desc"
}
}
This will give the elements in the "rating" array numerical values and then sort the documents in the descending order.
More on this can be found in here
You can sort on a text field (I recommend to do that on non-analyzed fields), the documents will be sorted in alphabetical order if sorted in ascending order and in reverse alphabetical order if sorted in descending order.
Example:
POST index/type/_search
{
"query":{
"match_all": {}
},
"sort":{"my_not_analyzed_field":{"order":"asc"} }
}
This code will match every document in index index of type type and will sort them in alphabetical order.
NB: It theoretically works on analyzed fields but the results will be surprising if the contents have been tokenized since the results will not be sorted on the basis of the start of the string but using any word (token) of the string, taking the more "advantageous" one.
I tried with the Vineeth's answer and ran into "[_script] unknown field [params], parser not found"
I made this change and it worked for me:
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "params.get(doc['rating'].value)",
"params": {
"good": 2,
"average": 1,
"bad": 0
}
},
"order": "asc"
}
}

ElasticSearch - sort search results by relevance and custom field (Date)

For example, I have entities with two fields - Text and Date. I want search by entities with results sorted by Date. But if I do it simply, then the result is unexpected.
For search query "Iphone 6" there are the newest texts only with "6" in top of еру results, not with "iphone 6". Without sorting the results seem nice, but not ordered by Date as I want.
How write custom sort function which will consider both relevance and Date? Or may be exist way to give weight to field Date which will be consider in scoring?
In addition, may be I shall want to suppress search results only with "6". How to customize search to find results only by bigrams for example?
Did you tried with bool query like this
{
"query": {
"bool": {
"must": {
"match": {
"field": "iphone 6"
}
}
}
},
"sort": {
"date": {
"order": "desc"
}
}
}
or with your query you can also do this with is more appropriate way of doing i guess ..
just add this as sort
"sort": [
{ "date": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
all matching results sorted first by date, then by relevance.
The solution is to use _score and the date field both in sort. _score as the first sort order and date field as secondary sort order.
You can use simple match query to perform relevance match.
Try it out.
Data setup:
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-10"
}
POST ecom/prod
{
"name":"iphone 5",
"date":"2019-01-10"
}
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-28"
}
POST ecom/prod
{
"name":"6",
"date":"2019-03-01"
}
Query for relevance and date based sorting:
POST ecommerce/prododuct/_search
{
"query": {
"match": {
"name": "iphone 6"
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"date": {
"order": "desc"
}
}
]
}
You could definitely use a phrase matching query for this.
It does position-aware matching so the documents will be considered a match for your query only if both "iphone" and "6" occur in the searched fields AND that their occurrences respects this order, "iphone" shows up before "6".
looks like you want to sort first by relevance and then by date. this query will do it.
{ "query" : {
"match" : {
"my_field" : "my query"
}
},
"sort": {
"pubDate": {
"order": "desc",
"mode": "min"
}
}
}
When sorting on fields with more than one value, remember that the
values do not have any intrinsic order; a multivalue field is just a
bag of values. Which one do you choose to sort on? For numbers and
dates, you can reduce a multivalue field to a single value by using
the min, max, avg, or sum sort modes. For instance, you could sort on
the earliest date in each dates field by using the above query.
elasticsearch guide sorting
I think your relevance is broken. You should use two different analyzers, 1 for setting up your index and another for searching. like this:
PUT /my_index/my_type/_mapping
{
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
also you can read more about this here: https://www.elastic.co/guide/en/elasticsearch/guide/master/_index_time_search_as_you_type.html
Once you fix the relevance then sorting should work correctly.

Resources