Elasticsearch aggregation with date_histogram gives wrong result for buckets

Elasticsearch aggregation with date_histogram gives wrong result for buckets - elasticsearch

I have data with timestamp. I want to do date_histogram on that.
When I run the query it return total as 13 which is correct, but it shows one record in 2014-10-10, but I cant find that record in data I have.
curl http://localhost:9200/test/test/_search -X POST -d '{"fields":
["creation_time"],
"query" :
{"filtered":
{"query":
{"match":
{"type": "test.type"}
}
}
},
"aggs":
{"group_by_created_by":
{"date_histogram":
{"field":"creation_time", "interval": "1d"}
}
}
}' | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2083 100 1733 100 350 234k 48590 --:--:-- --:--:-- --:--:-- 241k
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"aggregations": {
"group_by_created_at": {
"buckets": [
{
"doc_count": 12,
"key": 1412812800000,
"key_as_string": "2014-10-09T00:00:00.000Z"
},
{
"doc_count": 1,
"key": 1412899200000,
"key_as_string": "2014-10-10T00:00:00.000Z"
}
]
}
},
"hits": {
"hits": [
{
"_id": "qk5EGDqUSoW-ckZU9bnSsA",
"_index": "test",
"_score": 3.730029,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T16:35:39.535389"
]
}
},
{
"_id": "GnglI_3xRYii_oE5q91FUg",
"_index": "test",
"_score": 3.6149597,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T17:16:55.677919"
]
}
},
{
"_id": "ELP1f_-IS8SJiT4i4Vh6_g",
"_index": "test",
"_score": 2.974081,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T01:21:21.691270"
]
}
},
{
"_id": "ySlIV4vWRvm_q0-9p87dEQ",
"_index": "test",
"_score": 2.974081,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T01:33:51.291644"
]
}
},
{
"_id": "swXVnMmJSsmNW30zeJvCoQ",
"_index": "test",
"_score": 2.974081,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T17:08:45.738821"
]
}
},
{
"_id": "h0j6L-VGTnyChSIevtt2og",
"_index": "test",
"_score": 2.974081,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T22:35:16.908080"
]
}
},
{
"_id": "ANoTEXIgRgml6gLD4YKtIg",
"_index": "test",
"_score": 2.9459102,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T01:25:18.869175"
]
}
},
{
"_id": "FSCPBsogT5OXghBUmKXidQ",
"_index": "test",
"_score": 2.9459102,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T01:42:49.000599"
]
}
},
{
"_id": "VEw6XbIySvW7h7GF7h4ynA",
"_index": "test",
"_score": 2.9459102,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T16:45:51.563595"
]
}
},
{
"_id": "J9NfffAvRPmFxtOBZ6IsCA",
"_index": "test",
"_score": 2.9169223,
"_type": "test",
"fields": {
"creation_time": [
"2014-10-09T01:23:30.546353"
]
}
}
],
"max_score": 3.730029,
"total": 13
},
"timed_out": false,
"took": 4
}
If you see the above examples, then there is no record on 10-10 but aggregation shows one record in that bucket.

Aggregations are done on all matching documents.
You do not set the size which means you the default 10 documents under hits. Change the size to 13(+) and your 2014-10-10 document should show.
When you have more results, which will make it unhandy to manually check all results, you can also use top_hits as a sub-aggregator to get a peak of what is in the bucket (there's a size option there as well).

If you count your hits, you will see there are only 10 objects. This is because, by default, Elasticsearch will return only the top ten result hits.
However, even if not present in the hits, all the documents matching the query are taken into account when computing your aggregations.
Try to update your query to :
{
"size": 13,
"fields": ["creation_time"],
"query" :
{"filtered":
{"query":
{"match":
{"type": "test.type"}
}
}
},
"aggs":
{"group_by_created_by":
{"date_histogram":
{"field":"creation_time", "interval": "1d"}
}
}
}
And you will see the document which has been created on the 10-10.

Related

Elastic Search Sorting -

My Data looks like this After sorting : Below is the response after sorting. But this is not the expected output.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8,
"max_score": null,
"hits": [
{
"_index": "test",
"_type": "size",
"_id": "AWVVTy-v9pbhY5QtJPGe",
"_score": null,
"_source": {
"e_size": "5"
},
"sort": [
"5"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTmY89pbhY5QtJPGa",
"_score": null,
"_source": {
"e_size": "3"
},
"sort": [
"3"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTxXe9pbhY5QtJPGd",
"_score": null,
"_source": {
"e_size": "100"
},
"sort": [
"100"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTpYJ9pbhY5QtJPGc",
"_score": null,
"_source": {
"e_size": "10-"
},
"sort": [
"10-"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTk1x9pbhY5QtJPGY",
"_score": null,
"_source": {
"e_size": "1-7"
},
"sort": [
"1-7"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTnsm9pbhY5QtJPGb",
"_score": null,
"_source": {
"e_size": "1-6"
},
"sort": [
"1-6"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVTjAq9pbhY5QtJPGX",
"_score": null,
"_source": {
"e_size": "1-2"
},
"sort": [
"1-2"
]
},
{
"_index": "test",
"_type": "size",
"_id": "AWVVThkT9pbhY5QtJPGW",
"_score": null,
"_source": {
"e_size": "1"
},
"sort": [
"1"
]
}
]
}
}
Below is the sort used to get the results.
{
"sort": [{
"e_size": {
"order": "desc"
}
}]
}
e_size type is "String" and index is "not_analyzed"
How to fix this sort issue. Do we need to use any analyzer for this. Or e_size data type should be different.

It has sorted it correctly. You can try to sort those strings yourself and you would get the same result.
Example:
["100", "10-"]
here "0" < "-" so that's why "10-" comes after "100" and so on. you can think of it as how do you find some words in a dictionary.
Either make e_size a number or use any different series of string for each e_size

Combine inner hits in elasticsearch?

I currently have a dataset that features a nested datatype in products, these are all listed within different vendors. I have various queries that check for search terms within the nested products array, ideally I want to be able to combine all the inner hits so that I can sort on such things as score rankings and price. At the moment the search results come back on a per document basis. Is it possible to combine inner hits in elasticsearch so that I get just a list of all the matching products?
Example Query
{
"_source": {
"includes": [ "*" ],
"excludes": [ "products" ]
},
"query": {
"nested": {
"path": "products",
"inner_hits": {
"size": 10,
"_source": [
"title"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"products.title" : {
"query": "Dress",
"fuzziness" : 0
}
}
}
]
}
}
}
}
}
Example output
{
"took": 477,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.9072125,
"hits": [
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a806c7af36d28314de953ff",
"_score": 2.9072125,
"_source": {
"name": "Argos",
"locations": [
{
"lon": -2.242797,
"lat": 53.482952
}
]
},
"inner_hits": {
"products": {
"hits": {
"total": 3,
"max_score": 3.0782251,
"hits": [
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a806c7af36d28314de953ff",
"_nested": {
"field": "products",
"offset": 3348
},
"_score": 3.0782251,
"_source": {
"title": "HOME Set of 2 Dress Covers - White"
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a806c7af36d28314de953ff",
"_nested": {
"field": "products",
"offset": 2599
},
"_score": 3.0782251,
"_source": {
"title": "Chad Valley Designabear Spotty Dress Outfit"
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a806c7af36d28314de953ff",
"_nested": {
"field": "products",
"offset": 771
},
"_score": 2.5651875,
"_source": {
"title": "Melissa and Doug Abby & Emma Magnetic Wooden Dress Up"
}
}
]
}
}
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_score": 2.3227787,
"_source": {
"name": "Superdry",
"locations": [
{
"lon": -2.241703,
"lat": 53.483469
}
]
},
"inner_hits": {
"products": {
"hits": {
"total": 186,
"max_score": 2.378731,
"hits": [
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6420
},
"_score": 2.378731,
"_source": {
"title": "Alexia Off Shoulder Dress"
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6417
},
"_score": 2.378731,
"_source": {
"title": "Erin Festival Skater Dress "
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6416
},
"_score": 2.378731,
"_source": {
"title": "Erin Racer Dress "
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6415
},
"_score": 2.378731,
"_source": {
"title": "Alice Knot Dress"
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6412
},
"_score": 2.378731,
"_source": {
"title": "Alice Knot Dress"
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6389
},
"_score": 2.378731,
"_source": {
"title": "Lagoon Logo Midi Dress"
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6388
},
"_score": 2.378731,
"_source": {
"title": "50's Boardwalk Dress "
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6386
},
"_score": 2.378731,
"_source": {
"title": "50's Boardwalk Dress "
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6385
},
"_score": 2.378731,
"_source": {
"title": "Graphic Sweat Dress"
}
},
{
"_index": "shopit",
"_type": "businesses",
"_id": "5a5c3beb734d1d3471839b1d",
"_nested": {
"field": "products",
"offset": 6382
},
"_score": 2.378731,
"_source": {
"title": "Breton Bardot Stripe Dress"
}
}
]
}
}
}
}
]
}
}

Nevermind, I should of paid better attention to the elasticsearch documentation which states:
Search requests return the whole document, not just the matching
nested documents. Although there are plans afoot to support returning
the best -matching nested documents with the root document, this is
not yet supported.
I think parent-child relationships are probably the way to go with this.

How to sort by match prioritising the most left words matched

How to sort by match prioritising the most left words matched
Explanation
Sort the prefix query by the word it matches, but prioritising the matches in the words more at left.
Tests I've made
Data
DELETE /test
PUT /test
PUT /test/person/_mapping
{
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {"type": "string"},
"original": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
PUT /test/person/1
{"name": "Berta Kassulke"}
PUT /test/person/2
{"name": "Kaley Bartoletti"}
PUT /test/person/3
{"name": "Kali Hahn"}
PUT /test/person/4
{"name": "Karolann Klein"}
PUT /test/person/5
{"name": "Sofia Mandez Kaloo"}
The mapping was added for the 'sort on original value' test.
Simple query
Query
POST /test/person/_search
{
"query": {
"prefix": {"name": {"value": "ka"}}
}
}
Result
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "4",
"_score": 1,
"_source": {
"name": "Karolann Klein"
}
},
{
"_index": "test",
"_type": "person",
"_id": "5",
"_score": 1,
"_source": {
"name": "Sofia Mandez Kaloo"
}
},
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": 1,
"_source": {
"name": "Berta Kassulke"
}
},
{
"_index": "test",
"_type": "person",
"_id": "2",
"_score": 1,
"_source": {
"name": "Kaley Bartoletti"
}
},
{
"_index": "test",
"_type": "person",
"_id": "3",
"_score": 1,
"_source": {
"name": "Kali Hahn"
}
}
]
}
}
With sorting
Request
POST /test/person/_search
{
"query": {
"prefix": {"name": {"value": "ka"}}
},
"sort": {"name": {"order": "asc"}}
}
Result
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "2",
"_score": null,
"_source": {
"name": "Kaley Bartoletti"
},
"sort": [
"bartoletti"
]
},
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": null,
"_source": {
"name": "Berta Kassulke"
},
"sort": [
"berta"
]
},
{
"_index": "test",
"_type": "person",
"_id": "3",
"_score": null,
"_source": {
"name": "Kali Hahn"
},
"sort": [
"hahn"
]
},
{
"_index": "test",
"_type": "person",
"_id": "5",
"_score": null,
"_source": {
"name": "Sofia Mandez Kaloo"
},
"sort": [
"kaloo"
]
},
{
"_index": "test",
"_type": "person",
"_id": "4",
"_score": null,
"_source": {
"name": "Karolann Klein"
},
"sort": [
"karolann"
]
}
]
}
}
With sort on original value
Query
POST /test/person/_search
{
"query": {
"prefix": {"name": {"value": "ka"}}
},
"sort": {"name.original": {"order": "asc"}}
}
Result
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": null,
"_source": {
"name": "Berta Kassulke"
},
"sort": [
"Berta Kassulke"
]
},
{
"_index": "test",
"_type": "person",
"_id": "2",
"_score": null,
"_source": {
"name": "Kaley Bartoletti"
},
"sort": [
"Kaley Bartoletti"
]
},
{
"_index": "test",
"_type": "person",
"_id": "3",
"_score": null,
"_source": {
"name": "Kali Hahn"
},
"sort": [
"Kali Hahn"
]
},
{
"_index": "test",
"_type": "person",
"_id": "4",
"_score": null,
"_source": {
"name": "Karolann Klein"
},
"sort": [
"Karolann Klein"
]
},
{
"_index": "test",
"_type": "person",
"_id": "5",
"_score": null,
"_source": {
"name": "Sofia Mandez Kaloo"
},
"sort": [
"Sofia Mandez Kaloo"
]
}
]
}
}
Intended result
Sorted by name ASC but prioritising the matches on the most left words
Kaley Bartoletti
Kali Hahn
Karolann Klein
Berta Kassulke
Sofia Mandez Kaloo

Good Question. One way to achieve this would be with the combination of edge ngram filter and span first query
This is my setting
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase",
"edge_filter",
"asciifolding"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 8
}
}
}
},
"mappings": {
"person": {
"properties": {
"name": {
"type": "string",
"analyzer": "my_custom_analyzer",
"search_analyzer": "standard",
"fields": {
"standard": {
"type": "string"
}
}
}
}
}
}
}
After that I inserted your sample documents. Then I wrote the following query with dis_max. Notice that end parameter for first span query is 1 so this will prioritize(higher score) leftmost match. I am first sorting by score and then by name.
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"match": {
"name": "ka"
}
},
{
"span_first": {
"match": {
"span_term": {
"name": "ka"
}
},
"end": 1
}
},
{
"span_first": {
"match": {
"span_term": {
"name": "ka"
}
},
"end": 2
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"name.standard": {
"order": "asc"
}
}
]
}
The result I get
"hits": [
{
"_index": "esedge",
"_type": "policy_data",
"_id": "2",
"_score": 0.72272325,
"_source": {
"name": "Kaley Bartoletti"
},
"sort": [
0.72272325,
"bartoletti"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "3",
"_score": 0.72272325,
"_source": {
"name": "Kali Hahn"
},
"sort": [
0.72272325,
"hahn"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "4",
"_score": 0.72272325,
"_source": {
"name": "Karolann Klein"
},
"sort": [
0.72272325,
"karolann"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "1",
"_score": 0.54295504,
"_source": {
"name": "Berta Kassulke"
},
"sort": [
0.54295504,
"berta"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "5",
"_score": 0.2905494,
"_source": {
"name": "Sofia Mandez Kaloo"
},
"sort": [
0.2905494,
"kaloo"
]
}
]
I hope this helps.

Get every Nth result in Elasticsearch

I have this large set of data and I want a sample that I can use in a graph. For this I don't need all of the data, I need every Nth item.
For instance if I have 4000 results, and I only need 800 results, I want to be able to get every 5th result.
So some like: get, skip, skip, skip, skip, get, skip, skip, skip,..
I was wondering if such a thing is possible in Elasticsearch?

You're better off using a scripted filter. Otherwise you're needlessly using the score. Filters are just like queries, but they don't use scoring.
POST /test_index/_search
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc['unique_counter'].value % n == 0",
"params" : {
"n" : 5
}
}
}
}
}
}
You're also better off not using dynamic scripting in real world usage.
That said, you probably want to take a look at aggregations for graphing analytical information about your data rather than taking an arbitrary sample.

One way you could do it is with random scoring. It won't give you precisely every nth item according to a rigid ordering, but if you can relax that requirement this trick should do nicely.
To test it I set up a simple index (I mapped "doc_id" to "_id" just so the documents would have some contents, so that part isn't required, in case that's not obvious):
PUT /test_index
{
"mappings": {
"doc": {
"_id": {
"path": "doc_id"
}
}
}
}
Then I indexed ten simple documents:
POST /test_index/doc/_bulk
{"index":{}}
{"doc_id":1}
{"index":{}}
{"doc_id":2}
{"index":{}}
{"doc_id":3}
{"index":{}}
{"doc_id":4}
{"index":{}}
{"doc_id":5}
{"index":{}}
{"doc_id":6}
{"index":{}}
{"doc_id":7}
{"index":{}}
{"doc_id":8}
{"index":{}}
{"doc_id":9}
{"index":{}}
{"doc_id":10}
Now I can pull back three random documents like this:
POST /test_index/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "some seed"
}
}
]
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0.93746644,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.93746644,
"_source": {
"doc_id": 1
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "10",
"_score": 0.926947,
"_source": {
"doc_id": 10
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "5",
"_score": 0.79400194,
"_source": {
"doc_id": 5
}
}
]
}
}
Or a different random three like this:
POST /test_index/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "some other seed"
}
}
]
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0.817295,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": 0.817295,
"_source": {
"doc_id": 4
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "8",
"_score": 0.469319,
"_source": {
"doc_id": 8
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.4374538,
"_source": {
"doc_id": 3
}
}
]
}
}
Hopefully it's clear how to generalize this method to what you need. Just take out however many documents you want, in however many chunks make it performant.
Here is all the code I used to test:
http://sense.qbox.io/gist/a02d4da458365915f5e9cf6ea80546d2dfabc75d
EDIT: Actually now that I think about it, you could also use scripted scoring to get precisely every nth item, if you set it up right. Maybe something like,
POST /test_index/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "if(doc['doc_id'].value % 3 == 0){ return 1 }; return 0;"
}
}
]
}
}
}
...
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"doc_id": 3
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "6",
"_score": 1,
"_source": {
"doc_id": 6
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "9",
"_score": 1,
"_source": {
"doc_id": 9
}
}
]
}
}

ElasticSearch Order By String Length

I am using ElasticSearch via NEST c#. I have large list of information about people
{
firstName: 'Frank',
lastName: 'Jones',
City: 'New York'
}
I'd like to be able to filter and sort this list of items by lastName as well as order by the length so people who only have 5 characters in their name will be at the beginning of the result set then people with 10 characters.
So with some pseudo code I'd like to do something like
list.wildcard("j*").sort(m => lastName.length)

You can do the sorting with script-based sorting.
As a toy example, I set up a trivial index with a few documents:
PUT /test_index
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name":"Bob"}
{"index":{"_id":2}}
{"name":"Jeff"}
{"index":{"_id":3}}
{"name":"Darlene"}
{"index":{"_id":4}}
{"name":"Jose"}
Then I can order search results like this:
POST /test_index/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"script": "doc['name'].value.length()",
"type": "number",
"order": "asc"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": null,
"_source": {
"name": "Bob"
},
"sort": [
3
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": null,
"_source": {
"name": "Jose"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": null,
"_source": {
"name": "Jeff"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": null,
"_source": {
"name": "Darlene"
},
"sort": [
7
]
}
]
}
}
To filter by length, I can use a script filter in a similar way:
POST /test_index/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "doc['name'].value.length() > 3",
"params": {}
}
}
}
},
"sort": {
"_script": {
"script": "doc['name'].value.length()",
"type": "number",
"order": "asc"
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": null,
"_source": {
"name": "Jose"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": null,
"_source": {
"name": "Jeff"
},
"sort": [
4
]
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": null,
"_source": {
"name": "Darlene"
},
"sort": [
7
]
}
]
}
}
Here's the code I used:
http://sense.qbox.io/gist/22fef6dc5453eaaae3be5fb7609663cc77c43dab
P.S.: If any of the last names will contain spaces, you might want to use "index": "not_analyzed" on that field.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch aggregation with date_histogram gives wrong result for buckets - elasticsearch

Related

Elastic Search Sorting -

Combine inner hits in elasticsearch?

How to sort by match prioritising the most left words matched

Get every Nth result in Elasticsearch

ElasticSearch Order By String Length

Categories

Resources