Why can't I get Elasticsearch's completion suggester to sort based on a field? - elasticsearch

I'm trying to get autocomplete suggestions from Elasticsearch, but sorted by an internal popularity score that I supply in the data, so that the most popular ones show at the top. My POST looks like this:
curl "http://penguin:9200/node/_search?pretty" --silent --show-error \
--header "Content-Type: application/json" \
-X POST \
-d '
{
"_source" : [
"name",
"popular_score"
],
"sort" : [ "popular_score" ],
"suggest" : {
"my_suggestion" : {
"completion" : {
"field" : "searchbar_suggest",
"size" : 10,
"skip_duplicates" : true
},
"text" : "f"
}
}
}
'
I get back valid autocomplete suggestions, but they aren't sorted by the popular_score field:
{
...
"suggest" : {
"my_suggestion" : [
{
"text" : "f",
"offset" : 0,
"length" : 1,
"options" : [
{
"text" : "2020 Fact Longlist",
"_index" : "node",
"_type" : "_doc",
"_id" : "245105",
"_score" : 1.0,
"_source" : {
"popular_score" : "35",
"name" : "2020 Fact Longlist"
}
},
{
"text" : "Fable",
"_index" : "node",
"_type" : "_doc",
"_id" : "125903",
"_score" : 1.0,
"_source" : {
"popular_score" : "69.33333333333333333333333333333333333333",
"name" : "Fable"
}
},
{
"text" : "Fables",
"_index" : "node",
"_type" : "_doc",
"_id" : "172986",
"_score" : 1.0,
"_source" : {
"popular_score" : "24",
"name" : "Fables"
}
}
...
]
}
]
}
}
My mappings are:
{
"mappings": {
"properties": {
"nodeid": {
"type": "integer"
},
"name": {
"type": "text",
"copy_to": "searchbar_suggest"
},
"popular_score": {
"type": "float"
},
"searchbar_suggest": {
"type": "completion"
}
}
}
}
What am I doing wrong?

Related

Is it possible to extract the stored value of a keyword field when _source is disabled in Elasticsearch 7

I have the following index:
{
"articles_2022" : {
"mappings" : {
"_source" : {
"enabled" : false
},
"properties" : {
"content" : {
"type" : "text",
"norms" : false
},
"date" : {
"type" : "date"
},
"feed_canonical" : {
"type" : "boolean"
},
"feed_id" : {
"type" : "integer"
},
"feed_subscribers" : {
"type" : "integer"
},
"language" : {
"type" : "keyword",
"doc_values" : false
},
"title" : {
"type" : "text",
"norms" : false
},
"url" : {
"type" : "keyword",
"doc_values" : false
}
}
}
}
}
I have a very specific one-time need and I want to extract the stored values from the url field for all documents. Is this possible with Elasticsearch 7? Thanks!
Since in your index mapping, you have defined url field as of keyword type and have "doc_values": false. Therefore you cannot perform terms aggregation on this.
As far as I can understand your question, you only need to get the value of the of the url field in several documents. For that you can use exists query
Adding a working example
Index Mapping:
PUT idx1
{
"mappings": {
"properties": {
"url": {
"type": "keyword",
"doc_values": false
}
}
}
}
Index Data:
POST idx1/_doc/1
{
"url":"www.google.com"
}
POST idx1/_doc/2
{
"url":"www.youtube.com"
}
Search Query:
POST idx1/_search
{
"_source": [
"url"
],
"query": {
"exists": {
"field": "url"
}
}
}
Search Response:
"hits" : [
{
"_index" : "idx1",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"url" : "www.google.com"
}
},
{
"_index" : "idx1",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"url" : "www.youtube.com"
}
}
]
As your
"_source" : { "enabled" : false }
You can add mapping "store:true" for the field that you want to extract value of.
As
PUT indexExample2
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"url": {
"type": "keyword",
"doc_values": false,
"store": true
}
}
}
}
Now once you index data, #ESCoder Thanks for example.
POST indexExample2/_doc/1
{
"url":"www.google.com"
}
POST indexExample2/_doc/2
{
"url":"www.youtube.com"
}
You can extract only the stored field in your search queries, even if _source is disabled.
POST indexExample2/_search
{
"query": {
"exists": {
"field": "url"
}
},
"stored_fields": ["url"]
}
This will o/p as:
"hits" : [
{
"_index" : "indexExample2",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"url" : [
"www.google.com"
]
}
},
{
"_index" : "indexExample2",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"url" : [
"www.youtube.com"
]
}
}
]

filter with special character in ElasticSearch 6.0.0

I am trying to filter all data which contains some special character like '#', '.','/' etc. But not able to succeed.
I am willing to fetch the city which contains the # or dot(.), so i need a query which provide me the output that contains the special character.
I am quite new here in Elasticsearch query. So please help me.
Thanks
Below is index:
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [
{
"_index" : "student",
"_type" : "data",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "Mirja",
"city" : "pune # bandra",
"contact number" : 9723124343
}
},
{
"_index" : "student",
"_type" : "data",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Rohan",
"city" : "BBSR /. patia",
"contact number" : 9723124343
}
},
{
"_index" : "student",
"_type" : "data",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Diya",
"city" : "pune_bandra",
"contact number" : 9723124343
}
}
}
]
}
}```
You need to check the analyzer on your city field. If it's standard analyzer, it will remove special characters when creating tokens. Instead use the below mapping on city field and search using a regular match query
PUT test_index
{
"mappings": {
"properties": {
"city": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}

Elastic Search sort by boolean field

I want to sort my list by true value in a field called trusted.
I have found that the sort option does not support boolean sorting.
How can I achieve this?
If I understood your issue well, I tried to do a test locally on ES version 7.8, and I ingested the following data in my index:
"content": "This is a test",
"trusted": true
"content": "This is a new test",
"trusted": true
"content": "This is not a test",
"trusted": false
Here is the mapping of the index:
"mappings" : {
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"trusted" : {
"type" : "boolean"
}
}
}
Here is the query when "order" : "desc":
{
"sort": [
{
"trusted": {
"order": "desc"
}
}
]
}
The response:
"hits" : [
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "B-YleHQBsTCl1BZvrFdA",
"_score" : null,
"_source" : {
"content" : "This is a test",
"trusted" : true
},
"sort" : [
1
]
},
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "CeYleHQBsTCl1BZvtFdJ",
"_score" : null,
"_source" : {
"content" : "This is a new test",
"trusted" : true
},
"sort" : [
1
]
},
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "DOYleHQBsTCl1BZvvVfl",
"_score" : null,
"_source" : {
"content" : "This is not a test",
"trusted" : false
},
"sort" : [
0
]
}
]
When "order":"asc", the response is:
"hits" : [
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "DOYleHQBsTCl1BZvvVfl",
"_score" : null,
"_source" : {
"content" : "This is not a test",
"trusted" : false
},
"sort" : [
0
]
},
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "B-YleHQBsTCl1BZvrFdA",
"_score" : null,
"_source" : {
"content" : "This is a test",
"trusted" : true
},
"sort" : [
1
]
},
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "CeYleHQBsTCl1BZvtFdJ",
"_score" : null,
"_source" : {
"content" : "This is a new test",
"trusted" : true
},
"sort" : [
1
]
}
]
Links:
https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html
Please let me know If i wrongly answered, I will be glad to help.

Search in nested object

I'm having trouble making a query on elasticsearch 7.3
I create an index as this:
PUT myindex
{
"mappings": {
"properties": {
"files": {
"type": "nested"
}
}
}
}
After I create three documents:
PUT myindex/_doc/1
{
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2",
"files" : [
{
"filename" : "firstfilename.exe",
"datetime" : 111111111
},
{
"filename" : "secondfilename.exe",
"datetime" : 111111144
}
]
}
PUT myindex/_doc/2
{
"SHA256" : "87ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "thirdfilename.exe",
"datetime" : 111111133
},
{
"filename" : "fourthfilename.exe",
"datetime" : 111111122
}
]
}
PUT myindex/_doc/3
{
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a",
"files" : [
{
"filename" : "fifthfilename.exe",
"datetime" : 111111155
}
]
}
How can I get the last two files based on the datetime (ids: 1 and 3)?
I would SHA256 of the last two DATETIME ordered by DESC..
I did dozens of tests but none went well...
I don't write the code I tried because I'm really on the high seas ...
I would a result like this or similar:
{
"SHA256": [
"94ee05933....a2af6a22cc2",
"565e04933....a2af6a22c5a"
]
}
Query:
GET myindex/_search
{
"_source":"SHA256",
"sort": [
{
"files.datetime": {
"mode":"max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
]
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
]
}
]
In sort you will get the max date time value . So If you need to get file names too , you can add it in _source and use sort file to get appropriate file name.
A bit more complicated query this will give you exactly two values.
GET myindex/_search
{
"_source": "SHA256",
"query": {
"bool": {
"must": [
{
"nested": {
"path": "files",
"query": {
"match_all": {}
},
"inner_hits": {
"size":1,
"sort": [
{
"files.datetime": "desc"
}
]
}
}
}
]
}
},
"sort": [
{
"files.datetime": {
"mode": "max",
"order": "desc",
"nested_path": "files"
}
}
],
"size": 2
}
Result:
[
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"SHA256" : "565e049335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22c5a"
},
"sort" : [
111111155
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "3",
"_nested" : {
"field" : "files",
"offset" : 0
},
"_score" : null,
"_source" : {
"filename" : "fifthfilename.exe",
"datetime" : 111111155
},
"sort" : [
111111155
]
}
]
}
}
}
},
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"SHA256" : "94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2"
},
"sort" : [
111111144
],
"inner_hits" : {
"files" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "files",
"offset" : 1
},
"_score" : null,
"_source" : {
"filename" : "secondfilename.exe",
"datetime" : 111111144
},
"sort" : [
111111144
]
}
]
}
}
}
}
]

Why are my completion suggester options empty?

I'm currently trying to setup my suggestion implementation.
My index settings / mappings:
{
"settings" : {
"analysis" : {
"analyzer" : {
"trigrams" : {
"tokenizer" : "mesh_default_ngram_tokenizer",
"filter" : [ "lowercase" ]
},
"suggestor" : {
"type" : "custom",
"tokenizer" : "standard",
"char_filter" : [ "html_strip" ],
"filter" : [ "lowercase" ]
}
},
"tokenizer" : {
"mesh_default_ngram_tokenizer" : {
"type" : "nGram",
"min_gram" : "3",
"max_gram" : "3"
}
}
}
},
"mappings" : {
"default" : {
"properties" : {
"uuid" : {
"type" : "string",
"index" : "not_analyzed"
},
"language" : {
"type" : "string",
"index" : "not_analyzed"
},
"fields" : {
"properties" : {
"content" : {
"type" : "string",
"index" : "analyzed",
"analyzer" : "trigrams",
"fields" : {
"suggest" : {
"type" : "completion",
"analyzer" : "suggestor"
}
}
}
}
}
}
}
}
My query:
{
"suggest": {
"query-suggest" : {
"text" : "som",
"completion" : {
"field" : "fields.content.suggest"
}
}
},
"_source": ["fields.content", "uuid", "language"]
}
The query result:
{
"took" : 44,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "c6b7391075cc437ab7391075cc637a05-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content s<b>om</b>e text with more content you can poke a stick at"
},
"uuid" : "c6b7391075cc437ab7391075cc637a05"
}
}, {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "96e2c6765b6841fea2c6765b6871fe36-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content no text with more content you can poke a stick at"
},
"uuid" : "96e2c6765b6841fea2c6765b6871fe36"
}
}, {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "fd1472555e9d4d039472555e9d5d0386-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content someth<strong>ing</strong> completely different"
},
"uuid" : "fd1472555e9d4d039472555e9d5d0386"
}
}, {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "5a3727b134064de4b727b134063de4c4-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content some<strong>what</strong> strange content"
},
"uuid" : "5a3727b134064de4b727b134063de4c4"
}
}, {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "865257b6be4340c69257b6be4340c603-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content some <strong>more</strong> content you can poke a stick at too"
},
"uuid" : "865257b6be4340c69257b6be4340c603"
}
} ]
},
"suggest" : {
"query-suggest" : [ {
"text" : "som",
"offset" : 0,
"length" : 3,
"options" : [ ]
} ]
}
}
I'm currently using Elasticsearch 2.4.6 and I can't update
There are 5 document in my index and only 4 contain the word "some".
Why do I see 5 hits but no options?
The options are not empty if I start my suggest text with the first word of the field string. (e.g: this)
Is my usage of the suggest feature valid when dealing with fields that contain full html pages? I'm not sure whether the feature was meant to handle many tokens per document.
I already tried to use ngram tokenizer for my suggestor analyzer but that did not change the situation. Any hint or feedback would be appreciated.
It seems that the issue I'm seeing is a restriction is completion suggesters:
Matching always starts at the beginning of the text. So, for example, “Smi” will match “Smith, Fed” but not “Fed Smith”. However, you could list both “Smith, Fed” and “Fed Smith” as two different inputs for the one output.
http://rea.tech/implementing-autosuggest-in-elasticsearch/

Resources