Elasticsearch autocomplete integer field - elasticsearch

I am trying to implement an autocomplete feature on a numeric field (it's actual type in ES is long).
I am using a jQuery UI Autocomplete widget on the client side, having it's source function send a query to Elasticsearch with the prefix term to get a number (say, 5) of autocomplete options.
The query I am using is something like the following:
{
"size": 0,
"query": {
"prefix": {
"myField": "<term>"
}
},
"aggs": {
"myAggregation": {
"terms": {
"field": "myField",
"size": 5
}
}
}
}
Such that if myField has the distinct values: [1, 15, 151, 21, 22], and term is 1, then I'd expect to get from ES the buckets with keys [1, 15, 151].
The problem is this does not seem to work with numeric fields. For the above example, I am getting a single bucket with the key 1, and if term is 15 I am getting a single bucket with key 15, i.e. it only returns exact matches. In contrast, it works perfectly for string typed fields.
I am guessing I need some special mapping for myField, but I'd prefer to have the mapping as general as possible, while having the autocomplete working with minimal changes to the mapping (just to note - the index I am querying might be a general one, external to my application, so I will be able to change the type/field mappings in it only if the new mapping is something general and standard).
What are my options here?

What I would do is to create a string sub-field into your integer field, like this:
{
"myField": {
"type": "integer",
"fields": {
"to_string": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Then your query would need to be changed to the one below, i.e. query on the string field, but retrieve the terms aggregations from the integer field
{
"size": 0,
"query": {
"prefix": {
"myField.to_string": "1"
}
},
"aggs": {
"myAggregation": {
"terms": {
"field": "myField",
"size": 5
}
}
}
}
Note that you can also create a completely independent field, not necessary a sub-field, the key point is that one field needs the integer value to run the terms aggregation on and the other field needs the string value to run the prefix query on.

Related

Elasticsearch sort by filtered value

I'm using Elasticsearch 7.12, upgrading to 7.17 soon.
The following description of my problem has had the confusing business logic for my exact scenario removed.
I have an integer field in my document named 'Points'. It will usually contain 5-10 values, but may contain more, probably not more than 100 values. Something like:
Document 1:
{
"Points": [3, 12, 34, 60, 1203, 70, 88]
}
Document 2:
{
"Points": [16, 820, 31, 60]
}
Document 3:
{
"Points": [93, 20, 55]
}
My search needs to return documents with values within a range, such as between 10 and 19 inclusive. That part is fine. However I need to sort the results by the values found in that range. From the example above, I might need to find values between 30-39, sorted by the value in that range ascending - it should return Document 2 (containing value of 31) followed by Document 1 (containing value of 34).
Due to the potential range of values and searches I can't break this field down into fields like 0-9, 10-19 etc. to search on them independently - there would be many thousands of fields.
The documents themselves are otherwise quite large and there are a large number of them, so I have been advised to avoid nested fields if possible.
Can I apply a filter to a sort? Do I need a script to achieve this?
Thanks.
There are several ways of doing this:
Histogram aggregation
Aggregate your documents using a histogram aggregation with "hard bounds". Example query
POST /my_index/_search?size=0
{
"query": {
"constant_score": { "filter": { "range": { "Points": { "gte": "30", "lte" : "40" } } } }
},
"aggs": {
"points": {
"histogram": {
"field": "Points",
"interval": 10,
"hard_bounds": {
"min": 30,
"max": 40
}
},
"aggs" : {"top" : {"top_hits" : {}}}
}
}
}
THis will aggregate all the documents as long as they fall in that range, and the first bucket in the results, will contain the document that you want.
Try this with an extended terms aggregation:
If the range you want is relatively small. For eg like you mentioned "30 - 39", a simple terms aggregation on the results with an inclusion for all the numbers in that range, will also give you the desired result.
Example Query:
POST /my_index/_search?size=0
{
"query": {
"constant_score": { "filter": { "range": { "Points": { "gte": "30", "lte" : "40" } } } }
},
"aggs": {
"points": {
"terms": {
"field": "Points",
"include" : ["30","31"....,"39"]
},
"aggs" : {"top": {"top_hits" : {}}}
}
}
}
Each bucket in the terms aggregation results will contain the documents that have that particular "Point" occurring at least once. The first document in the first bucket has what you want.
The third option involves building a runtime field, that will trim the points to contain only the points between your range, and then sorting in ascending order on that field. But that will be slower.
HTH.

Elasticsearch Terms aggregation with unknown datatype

I'm indexing data of unknown schema in Elasticsearch using dynamic mapping, i.e. we don't know the shape, datatypes, etc. of much of the data ahead of time. In queries, I want to be able to aggregate on any field. Strings are (by default) mapped as both text and keyword types, and only the latter can be aggregated on. So for strings my terms aggregations must look like this:
"aggs": {
"something": {
"terms": {
"field": "something.keyword"
}
}
}
But other types like numbers and bools do not have this .keyword sub-field, so aggregations for those must look like this (which would fail for text fields):
"aggs": {
"something": {
"terms": {
"field": "something"
}
}
}
Is there any way to specify a terms aggregation that basically says "if something.keyword exists, use that, otherwise just use something", and without taking a significant performance hit?
Requiring datatype information to be provided at query time might be an option for me, but ideally I want to avoid it if possible.
If the primary use case is aggregations, it may be worth changing the dynamic mapping for string properties to index as a keyword datatype, with a multi-field sub-field indexed as a text datatype i.e. in dynamic_templates
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"ignore_above": 256,
"fields": {
"text": {
"type": "text"
}
}
}
}
},

elasticsearch sort by document id

I have a simple index in elasticsearch and all my ids are manually added, i.e. I do not add documents with automatic string ids.
Now the requirement is to get list of all documents page by page and sorted by the document id (i.e. _id)
When I tried this with _id, it did not work. Then I looked for it on forums and found out this much that I have to use _uid for that. This actually works, although I have no clue how. But another problem is that the sorting is done as if the the _id is string. And it actually is a string. But I want the results as if the _id was a number.
So there are two issues here:
Why sorting does not work with _id and it does work with _uid
Is there a way to get document ids sorted as numbers and not integers
For e.g. if my doc ids are 1, 2, 3, ..... , 55
I am getting results in this order:
1, 10, 11, 12, ... , 19, 2, 20, ... so on
While I would like to get the results in this order:
1, 2, 3, ... so on
Any help is highly appreciated!
Have the _id indexed:
{
"mappings": {
"some_type": {
"_id": {
"index": "not_analyzed"
}
}
}
}
And use a script:
{
"sort": {
"_script": {
"type": "number",
"script": "doc['_id'].value?.isInteger()?doc['_id'].value.toFloat():null",
"order": "asc"
}
}
}
Even though I strongly recommend, if possible, changing the id to integer rather having it as string and contain numbers, instead.
And I kind of doubt that it worked with _uid because _uid is a combination between type and id.
For some reasons the code above didn't work for me. ("dynamic method [java.lang.String, isInteger/0] not found")
However the script below works (only if your _id can be converted into integers)
GET ENDPOINT/INDEX/_search
{
"sort": {
"_script": {
"type": "number",
"script": "return Integer.parseInt(doc['_id'].value)",
"order": "desc" // I personally needed descending
}
}
}
Instead of id, I used id.keyword and it worked.. sample code below:
GET index_name/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"id.keyword": {
"order": "desc"
}
}
]
}

ElasticSearch - sort search results by relevance and custom field (Date)

For example, I have entities with two fields - Text and Date. I want search by entities with results sorted by Date. But if I do it simply, then the result is unexpected.
For search query "Iphone 6" there are the newest texts only with "6" in top of еру results, not with "iphone 6". Without sorting the results seem nice, but not ordered by Date as I want.
How write custom sort function which will consider both relevance and Date? Or may be exist way to give weight to field Date which will be consider in scoring?
In addition, may be I shall want to suppress search results only with "6". How to customize search to find results only by bigrams for example?
Did you tried with bool query like this
{
"query": {
"bool": {
"must": {
"match": {
"field": "iphone 6"
}
}
}
},
"sort": {
"date": {
"order": "desc"
}
}
}
or with your query you can also do this with is more appropriate way of doing i guess ..
just add this as sort
"sort": [
{ "date": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
all matching results sorted first by date, then by relevance.
The solution is to use _score and the date field both in sort. _score as the first sort order and date field as secondary sort order.
You can use simple match query to perform relevance match.
Try it out.
Data setup:
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-10"
}
POST ecom/prod
{
"name":"iphone 5",
"date":"2019-01-10"
}
POST ecom/prod
{
"name":"iphone 6",
"date":"2019-02-28"
}
POST ecom/prod
{
"name":"6",
"date":"2019-03-01"
}
Query for relevance and date based sorting:
POST ecommerce/prododuct/_search
{
"query": {
"match": {
"name": "iphone 6"
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"date": {
"order": "desc"
}
}
]
}
You could definitely use a phrase matching query for this.
It does position-aware matching so the documents will be considered a match for your query only if both "iphone" and "6" occur in the searched fields AND that their occurrences respects this order, "iphone" shows up before "6".
looks like you want to sort first by relevance and then by date. this query will do it.
{ "query" : {
"match" : {
"my_field" : "my query"
}
},
"sort": {
"pubDate": {
"order": "desc",
"mode": "min"
}
}
}
When sorting on fields with more than one value, remember that the
values do not have any intrinsic order; a multivalue field is just a
bag of values. Which one do you choose to sort on? For numbers and
dates, you can reduce a multivalue field to a single value by using
the min, max, avg, or sum sort modes. For instance, you could sort on
the earliest date in each dates field by using the above query.
elasticsearch guide sorting
I think your relevance is broken. You should use two different analyzers, 1 for setting up your index and another for searching. like this:
PUT /my_index/my_type/_mapping
{
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
also you can read more about this here: https://www.elastic.co/guide/en/elasticsearch/guide/master/_index_time_search_as_you_type.html
Once you fix the relevance then sorting should work correctly.

Elastic search filter facet using from and size parameters

I am trying to get the count in the previous pages of filtered documents in a facet.
Is that possible?
For example here I would like my previous_macs facets to count all documents before 20 that have the os_name: "mac".
{
"from": 20,
"size": 10,
"sort": {
"created_at": "desc"
},
"facets": {
"previous_macs": {
"filter": {
"term": {
"os_name": "mac"
}
}
// something here ? facet_filter maybe?
}
}
}
There is no generic way to do it, but if you can formulate a filter that would reliably filter out everything except the first 20 records, you can use it as facet_filter. So, if created_at field is unique, you can use it to create a range filter that would include everything up to and including the 20th document. You can then use this range filter as a facet_filter for your facet.

Resources