sort by on elasticsearch not working as expected - elasticsearch

I have ES query as something like below:
{"query":{"bool":{"must":[{"bool":{"should":[{"bool":{"must":[{"bool":{"should":[{"match":{"*login*":{"query":"jyo","operator":"and"}}}]}}],"boost":1.34}}]}}]}},"sort":[{"_uid":"desc"}]}
The output is :
{
"took": 77,
"timed_out": false,
"_shards": {
"total": 33,
"successful": 33,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "hobbes1.qa_en_19_2",
"_type": "esuser",
"_id": "6",
"_score": null,
"sort": [
"esuser#6"
]
},
{
"_index": "hobbes1.qa_en_19_2",
"_type": "esuser",
"_id": "5",
"_score": null,
"sort": [
"esuser#5"
]
},
{
"_index": "hobbes1.qa_en_19_2",
"_type": "esuser",
"_id": "4",
"_score": null,
"sort": [
"esuser#4"
]
},
{
"_index": "hobbes1.qa_en_19_2",
"_type": "esuser",
"_id": "10003",
"_score": null,
"sort": [
"esuser#10003"
]
}
]
}
}
If it is sorting by _id, then shouldn't 10003 be at the top? I am using elasticsearch version 1.7. Please help.

Because _id is not a number but a string, "10003" is less than "4", "5" or "6" on string comparation.

Related

How to sort by date with ElasticSearch

I have an index with a date field as following:
{
"properties": {
"productCreationDate": {
"format": "YYYY-MM-dd'T'HH:mm:ssXXX",
"type": "date"
},
}
}
When I perform a search that way:
{
"size": 5,
"from": 0,
"sort": [
{
"productCreationDate": {
"order": "desc"
}
}
],
"track_scores": false
}
I get the documents in the inserting order an note the field order on ElasticSearch 7.9:
{
"took": 24,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 6,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^14",
"_score": null,
"_source": {
"productCreationDate": "2020-08-14T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^28",
"_score": null,
"_source": {
"productCreationDate": "2020-08-28T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^19",
"_score": null,
"_source": {
"productCreationDate": "2020-08-19T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^27",
"_score": null,
"_source": {
"productCreationDate": "2020-08-27T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^26",
"_score": null,
"_source": {
"productCreationDate": "2020-08-26T18:21:51+02:00",
},
"sort": [
1577722911000
]
}
]
}
}
What do I miss?
Edit: Thanks to #zaid warsi and #Yeikel I have changed the format to yyyy and I have a new order:
15
26
27
19
28
14
Which is even weirder since I asked for 5 documents.
YYYY is not a correct inbuilt year format in Elasticsearch.
Try changing your date format to yyyy-MM-dd'T'HH:mm:ssXXX, it should work.
Refer this for valid inbuilt date formats, or you might need to define your own in the mapping.

Elastic Search Sorting -

My Data looks like this After sorting : Below is the response after sorting. But this is not the expected output.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": null,
"hits": [
{
"_index": "test",
"_type": "size",
"_id": "AWVVTy-v9pbhY5QtJPGe",
"_score": null,
"_source": {
"e_size": "5"
},
"sort": [
"5"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTmY89pbhY5QtJPGa",
"_score": null,
"_source": {
"e_size": "3"
},
"sort": [
"3"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTxXe9pbhY5QtJPGd",
"_score": null,
"_source": {
"e_size": "100"
},
"sort": [
"100"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTpYJ9pbhY5QtJPGc",
"_score": null,
"_source": {
"e_size": "10-"
},
"sort": [
"10-"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTk1x9pbhY5QtJPGY",
"_score": null,
"_source": {
"e_size": "1-7"
},
"sort": [
"1-7"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTnsm9pbhY5QtJPGb",
"_score": null,
"_source": {
"e_size": "1-6"
},
"sort": [
"1-6"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTjAq9pbhY5QtJPGX",
"_score": null,
"_source": {
"e_size": "1-2"
},
"sort": [
"1-2"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVThkT9pbhY5QtJPGW",
"_score": null,
"_source": {
"e_size": "1"
},
"sort": [
"1"
]
}
]
}
}
Below is the sort used to get the results.
{
"sort": [{
"e_size": {
"order": "desc"
}
}]
}
e_size type is "String" and index is "not_analyzed"
How to fix this sort issue. Do we need to use any analyzer for this. Or e_size data type should be different.
It has sorted it correctly. You can try to sort those strings yourself and you would get the same result.
Example:
["100", "10-"]
here "0" < "-" so that's why "10-" comes after "100" and so on. you can think of it as how do you find some words in a dictionary.
Either make e_size a number or use any different series of string for each e_size

Aggregation in elastic search by field value data

I have below set of data and I want aggregation as per the status. Not sure how to compare the value of status with rejected or success and get the count of result.
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2874,
"max_score": 1,
"hits": [
{
"_index": "testfiles",
"_type": "testfiles",
"_id": "testfile.one",
"_score": 1,
"_source": {
"businessDate": 20171013,
"status": "Success"
}
},
{
"_index": "testfiles",
"_type": "testfiles",
"_id": "testfile.two",
"_score": 1,
"_source": {
"businessDate": 20171013,
"status": "Success"
}
},
{
"_index": "testfiles",
"_type": "testfiles",
"_id": "testfile.three",
"_score": 1,
"_source": {
"businessDate": 20171013,
"status": "Rejected"
}
},
{
"_index": "testfiles",
"_type": "testfiles",
"_id": "testfile.four",
"_score": 1,
"_source": {
"businessDate": 20171013,
"status": "Rejected"
}
}
]
}
}
Can someone help to how to achieve this in elastic search aggregation.
Expected response something like below
"aggregations": {
"success_records": 2,
"rejected_records": 2
}
Assuming status field is of type text, you'll need to update it to multi-fields having a keyword type needed for aggregation. Then query using:
GET my_index/_search
{
"size": 0,
"aggs": {
"statuses": {
"terms": {
"field": "status.raw"
}
}
}
If you already have status as keyword field, then change status.raw to status in the above query.

Elastic search more like this Query score issue in 5.x

Recently we have changed Elasticsearch version from 2.4 to 5.4 .
we found one issue in more like this query in version 5.x .
following query is used to find out similar documents by text
INPUT Query
POST /test/_search
{
"size": 10000,
"stored_fields": [
"docid"
],
"_source": false,
"query": {
"more_like_this": {
"fields": [
"textcontent"
],
"like": [
{
"_index": "test",
"_type": "object",
"_id": "AV0c9jvZXF-b5U5aNAWB"
}
],
"max_query_terms": 5000,
"min_term_freq": 1,
"min_doc_freq": 1
}
}
}
Output of Elasticsearch 2.4
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.5381224,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z9",
"_score": 1.5381224,
"fields": {
"docid": [
"2"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal63Z",
"_score": .5381224,
"fields": {
"docid": [
"3"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z",
"_score": .381224,
"fields": {
"docid": [
"4"
]
}
}
Output of Elasticsearch 5.4
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.5381224,
"hits": [
{
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z9",
"_score": 168.5381224,
"fields": {
"docid": [
"2"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal63Z",
"_score": 164.5381224,
"fields": {
"docid": [
"3"
]
}
}, {
"_index": "test",
"_type": "object",
"_id": "AVzjOOdilllQ-Gyal6Z",
"_score": 132.381224,
"fields": {
"docid": [
"4"
]
}
}}
The output is same in both versions except the score of the documents.
version 5.4 is giving more score than 2.4.
We are dependent on score for our work so if the score changes then its a problem for us. Please provide solution for this?
I got the solution,In version 5.0 they have changed default similarity algorithm from classic to BM25 that was the reason for it.
Just change similarity type to classic while creating index.
and
if index is already exist then just update setting for all indices by executing following query
PUT /_all/_settings?preserve_existing=true
{
"index.similarity.default.type": "classic"
}

ElasticSearch Order By String Length

I am using ElasticSearch via NEST c#. I have large list of information about people
{
firstName: 'Frank',
lastName: 'Jones',
City: 'New York'
}
I'd like to be able to filter and sort this list of items by lastName as well as order by the length so people who only have 5 characters in their name will be at the beginning of the result set then people with 10 characters.
So with some pseudo code I'd like to do something like
list.wildcard("j*").sort(m => lastName.length)
You can do the sorting with script-based sorting.
As a toy example, I set up a trivial index with a few documents:
PUT /test_index
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name":"Bob"}
{"index":{"_id":2}}
{"name":"Jeff"}
{"index":{"_id":3}}
{"name":"Darlene"}
{"index":{"_id":4}}
{"name":"Jose"}
Then I can order search results like this:
POST /test_index/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"script": "doc['name'].value.length()",
"type": "number",
"order": "asc"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": null,
"_source": {
"name": "Bob"
},
"sort": [
3
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": null,
"_source": {
"name": "Jose"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": null,
"_source": {
"name": "Jeff"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": null,
"_source": {
"name": "Darlene"
},
"sort": [
7
]
}
]
}
}
To filter by length, I can use a script filter in a similar way:
POST /test_index/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "doc['name'].value.length() > 3",
"params": {}
}
}
}
},
"sort": {
"_script": {
"script": "doc['name'].value.length()",
"type": "number",
"order": "asc"
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": null,
"_source": {
"name": "Jose"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": null,
"_source": {
"name": "Jeff"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": null,
"_source": {
"name": "Darlene"
},
"sort": [
7
]
}
]
}
}
Here's the code I used:
http://sense.qbox.io/gist/22fef6dc5453eaaae3be5fb7609663cc77c43dab
P.S.: If any of the last names will contain spaces, you might want to use "index": "not_analyzed" on that field.

Resources