How to sort by date with ElasticSearch - elasticsearch

I have an index with a date field as following:
{
"properties": {
"productCreationDate": {
"format": "YYYY-MM-dd'T'HH:mm:ssXXX",
"type": "date"
},
}
}
When I perform a search that way:
{
"size": 5,
"from": 0,
"sort": [
{
"productCreationDate": {
"order": "desc"
}
}
],
"track_scores": false
}
I get the documents in the inserting order an note the field order on ElasticSearch 7.9:
{
"took": 24,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 6,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^14",
"_score": null,
"_source": {
"productCreationDate": "2020-08-14T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^28",
"_score": null,
"_source": {
"productCreationDate": "2020-08-28T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^19",
"_score": null,
"_source": {
"productCreationDate": "2020-08-19T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^27",
"_score": null,
"_source": {
"productCreationDate": "2020-08-27T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^26",
"_score": null,
"_source": {
"productCreationDate": "2020-08-26T18:21:51+02:00",
},
"sort": [
1577722911000
]
}
]
}
}
What do I miss?
Edit: Thanks to #zaid warsi and #Yeikel I have changed the format to yyyy and I have a new order:
15
26
27
19
28
14
Which is even weirder since I asked for 5 documents.

YYYY is not a correct inbuilt year format in Elasticsearch.
Try changing your date format to yyyy-MM-dd'T'HH:mm:ssXXX, it should work.
Refer this for valid inbuilt date formats, or you might need to define your own in the mapping.

Related

sort by on elasticsearch not working as expected

I have ES query as something like below:
{"query":{"bool":{"must":[{"bool":{"should":[{"bool":{"must":[{"bool":{"should":[{"match":{"*login*":{"query":"jyo","operator":"and"}}}]}}],"boost":1.34}}]}}]}},"sort":[{"_uid":"desc"}]}
The output is :
{
"took": 77,
"timed_out": false,
"_shards": {
"total": 33,
"successful": 33,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "hobbes1.qa_en_19_2",
"_type": "esuser",
"_id": "6",
"_score": null,
"sort": [
"esuser#6"
]
},
{
"_index": "hobbes1.qa_en_19_2",
"_type": "esuser",
"_id": "5",
"_score": null,
"sort": [
"esuser#5"
]
},
{
"_index": "hobbes1.qa_en_19_2",
"_type": "esuser",
"_id": "4",
"_score": null,
"sort": [
"esuser#4"
]
},
{
"_index": "hobbes1.qa_en_19_2",
"_type": "esuser",
"_id": "10003",
"_score": null,
"sort": [
"esuser#10003"
]
}
]
}
}
If it is sorting by _id, then shouldn't 10003 be at the top? I am using elasticsearch version 1.7. Please help.
Because _id is not a number but a string, "10003" is less than "4", "5" or "6" on string comparation.

Elastic Search Sorting -

My Data looks like this After sorting : Below is the response after sorting. But this is not the expected output.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": null,
"hits": [
{
"_index": "test",
"_type": "size",
"_id": "AWVVTy-v9pbhY5QtJPGe",
"_score": null,
"_source": {
"e_size": "5"
},
"sort": [
"5"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTmY89pbhY5QtJPGa",
"_score": null,
"_source": {
"e_size": "3"
},
"sort": [
"3"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTxXe9pbhY5QtJPGd",
"_score": null,
"_source": {
"e_size": "100"
},
"sort": [
"100"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTpYJ9pbhY5QtJPGc",
"_score": null,
"_source": {
"e_size": "10-"
},
"sort": [
"10-"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTk1x9pbhY5QtJPGY",
"_score": null,
"_source": {
"e_size": "1-7"
},
"sort": [
"1-7"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTnsm9pbhY5QtJPGb",
"_score": null,
"_source": {
"e_size": "1-6"
},
"sort": [
"1-6"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTjAq9pbhY5QtJPGX",
"_score": null,
"_source": {
"e_size": "1-2"
},
"sort": [
"1-2"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVThkT9pbhY5QtJPGW",
"_score": null,
"_source": {
"e_size": "1"
},
"sort": [
"1"
]
}
]
}
}
Below is the sort used to get the results.
{
"sort": [{
"e_size": {
"order": "desc"
}
}]
}
e_size type is "String" and index is "not_analyzed"
How to fix this sort issue. Do we need to use any analyzer for this. Or e_size data type should be different.
It has sorted it correctly. You can try to sort those strings yourself and you would get the same result.
Example:
["100", "10-"]
here "0" < "-" so that's why "10-" comes after "100" and so on. you can think of it as how do you find some words in a dictionary.
Either make e_size a number or use any different series of string for each e_size

Elastic search more like this Query score issue in 5.x

Recently we have changed Elasticsearch version from 2.4 to 5.4 .
we found one issue in more like this query in version 5.x .
following query is used to find out similar documents by text
INPUT Query
POST /test/_search
{
"size": 10000,
"stored_fields": [
"docid"
],
"_source": false,
"query": {
"more_like_this": {
"fields": [
"textcontent"
],
"like": [
{
"_index": "test",
"_type": "object",
"_id": "AV0c9jvZXF-b5U5aNAWB"
}
],
"max_query_terms": 5000,
"min_term_freq": 1,
"min_doc_freq": 1
}
}
}
Output of Elasticsearch 2.4
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.5381224,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z9",
"_score": 1.5381224,
"fields": {
"docid": [
"2"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal63Z",
"_score": .5381224,
"fields": {
"docid": [
"3"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z",
"_score": .381224,
"fields": {
"docid": [
"4"
]
}
}
Output of Elasticsearch 5.4
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.5381224,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z9",
"_score": 168.5381224,
"fields": {
"docid": [
"2"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal63Z",
"_score": 164.5381224,
"fields": {
"docid": [
"3"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z",
"_score": 132.381224,
"fields": {
"docid": [
"4"
]
}
}}
The output is same in both versions except the score of the documents.
version 5.4 is giving more score than 2.4.
We are dependent on score for our work so if the score changes then its a problem for us. Please provide solution for this?
I got the solution,In version 5.0 they have changed default similarity algorithm from classic to BM25 that was the reason for it.
Just change similarity type to classic while creating index.
and
if index is already exist then just update setting for all indices by executing following query
PUT /_all/_settings?preserve_existing=true
{
"index.similarity.default.type": "classic"
}

Elasticsearch - Return nested values in format

How can i make elasticsearch return nested values in format of hits {value1:..., value2..., value3..., etc..}
This is my request:
{
"_source": 0,
"query": {
"bool": {
"must": [
{
"nested": {
"path": "photo",
"query": {
"bool": {
"must": [
{
"match": {
"photo.hello": "true"
}
}
]
}
},
"inner_hits" : {}
}
}
]
}}}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.2231436,
"hits": [
{
"_index": ".3eautiful",
"_type": "profile",
"_id": "6UAaCls5iSgavEtFE2qMX902Xmb2",
"_score": 1.2231436,
"inner_hits": {
"photo": {
"hits": {
"total": 1,
"max_score": 1.2231436,
"hits": [
{
"_index": ".3eautiful",
"_type": "profile",
"_id": "6UAaCls5iSgavEtFE2qMX902Xmb2",
"_nested": {
"field": "photo",
"offset": 0
},
"_score": 1.2231436,
"_source": {
"hello": "true",
"i_am_superCOOL": "true",
"xoxox": "true",
"id": "-KSDRx5BN54JHitoq7Wb"
}
}
]
}
}
}
},
{
"_index": ".3eautiful",
"_type": "profile",
"_id": "KDFbeXrOedf7b6NVRGMO0HDIFgx1",
"_score": 1.2231436,
"inner_hits": {
"photo": {
"hits": {
"total": 2,
"max_score": 1.2231436,
"hits": [
{
"_index": ".3eautiful",
"_type": "profile",
"_id": "KDFbeXrOedf7b6NVRGMO0HDIFgx1",
"_nested": {
"field": "photo",
"offset": 1
},
"_score": 1.2231436,
"_source": {
"alahu": "true",
"hello": "true",
"same": "true",
"smukais": "true",
"id": "-KSDJzyUC_N5je-cR2aT"
}
},
{
"_index": ".3eautiful",
"_type": "profile",
"_id": "KDFbeXrOedf7b6NVRGMO0HDIFgx1",
"_nested": {
"field": "photo",
"offset": 0
},
"_score": 1.2231436,
"_source": {
"hello": "true",
"same": "true",
"selfyyy": "true",
"superSexy": "true",
"id": "-KPn4p7spS8NO7IVSLdF"
}
}
]
}
}
}
}
]
}
}
I am using 2 dimension dynamic attribute search, the problem with this approach is that the result's can be 20 from 1 user, but i need to make it propriety based.
Just sticked to the same format.

ElasticSearch Order By String Length

I am using ElasticSearch via NEST c#. I have large list of information about people
{
firstName: 'Frank',
lastName: 'Jones',
City: 'New York'
}
I'd like to be able to filter and sort this list of items by lastName as well as order by the length so people who only have 5 characters in their name will be at the beginning of the result set then people with 10 characters.
So with some pseudo code I'd like to do something like
list.wildcard("j*").sort(m => lastName.length)
You can do the sorting with script-based sorting.
As a toy example, I set up a trivial index with a few documents:
PUT /test_index
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name":"Bob"}
{"index":{"_id":2}}
{"name":"Jeff"}
{"index":{"_id":3}}
{"name":"Darlene"}
{"index":{"_id":4}}
{"name":"Jose"}
Then I can order search results like this:
POST /test_index/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"script": "doc['name'].value.length()",
"type": "number",
"order": "asc"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": null,
"_source": {
"name": "Bob"
},
"sort": [
3
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": null,
"_source": {
"name": "Jose"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": null,
"_source": {
"name": "Jeff"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": null,
"_source": {
"name": "Darlene"
},
"sort": [
7
]
}
]
}
}
To filter by length, I can use a script filter in a similar way:
POST /test_index/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "doc['name'].value.length() > 3",
"params": {}
}
}
}
},
"sort": {
"_script": {
"script": "doc['name'].value.length()",
"type": "number",
"order": "asc"
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": null,
"_source": {
"name": "Jose"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": null,
"_source": {
"name": "Jeff"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": null,
"_source": {
"name": "Darlene"
},
"sort": [
7
]
}
]
}
}
Here's the code I used:
http://sense.qbox.io/gist/22fef6dc5453eaaae3be5fb7609663cc77c43dab
P.S.: If any of the last names will contain spaces, you might want to use "index": "not_analyzed" on that field.

Resources