Is it possible to boost suggestions based on the Elasticsearch index - elasticsearch

I have multiple indices that I want suggestions from, but I want to score/order the suggestions based on the index they're from. I've successfully boosted searches based on indices (using indices_boost), but this doesn't seem to work for suggestions. I tried something like:
GET index1,index2/_search
{
"indices_boost" : [
{ "index1" : 9 },
{ "index2" : 1 }
],
"suggest": {
"mySuggest":{
"text":"someText",
"completion": {
"field":"suggestField",
"size":6
}
}
}
}
Is this doable?
At the moment I've resorted to sorting the suggestions in code.

I believe you can try to use category boost in context suggester to achieve the desired behavior. You need to attach a special category field to each suggestion document, which can be exactly the same as the index name.
How to use category context to boost suggestions
The mapping may look like this:
PUT food
{
"mappings": {
"properties" : {
"suggestField" : {
"type" : "completion",
"contexts": [
{
"name": "index_name",
"type": "category"
}
]
}
}
}
}
For demonstration purposes I will create another index, exactly like the one above but with name movie. (Index names can be arbitrary.)
Let's add the suggest documents:
PUT food/_doc/1
{
"suggestField": {
"input": ["timmy's", "starbucks", "dunkin donuts"],
"contexts": {
"index_name": ["food"]
}
}
}
PUT movie/_doc/2
{
"suggestField": {
"input": ["star wars"],
"contexts": {
"index_name": ["movie"]
}
}
}
Now we can run a suggest query with our boosts set:
POST food,movie/_search
{
"suggest": {
"my_suggestion": {
"prefix": "star",
"completion": {
"field": "suggestField",
"size": 10,
"contexts": {
"index_name": [
{
"context": "movie",
"boost": 9
},
{
"context": "food",
"boost": 1
}
]
}
}
}
}
}
Which will return something like this:
{
"suggest": {
"my_suggestion": [
{
"text": "star",
"offset": 0,
"length": 4,
"options": [
{
"text": "star wars",
"_index": "movie",
"_type": "_doc",
"_id": "2",
"_score": 9.0,
"_source": ...,
"contexts": {
"index_name": [
"movie"
]
}
},
{
"text": "starbucks",
"_index": "food",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": ...,
"contexts": {
"index_name": [
"food"
]
...
Why didn't indices_boost work?
Seems like indices_boost parameter only affects the search, not suggest. _suggest used to be a standalone endpoint, but was deprecated, and probably this is the source of confusion.
Hope that helps!

Related

How to search over all fields and return every document containing that search in elasticsearch?

I have a problem regarding searching in elasticsearch.
I have a index with multiple documents with several fields. I want to be able to search over all the fields running a query and want it to return all the documents that contains the value specified in the query. I Found that using simple_query_string worked well for this. However, it does not return consistent results. In my index I have documents with several fields that contain dates. For example:
"revisionDate" : "2008-01-01T00:00:00",
"projectSmirCreationDate" : "2008-07-01T00:00:00",
"changedDate" : "1971-01-01T00:00:00",
"dueDate" : "0001-01-01T00:00:00",
Those are just a few examples, however when I index for example:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "2008"
}
}
}
It only returns two documents, this is a problem because I have much more documents than just two that contains the value "2008" in their fields.
I also have problem searching file names.
In my index there are fields that contain fileNames like this:
"fileName" : "testPDF.pdf",
"fileName" : "demo.pdf",
"fileName" : "demo.txt",
When i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo"
}
}
}
I get no results
But if i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo.txt"
}
}
}
I get the proper result.
Is there any better way to search across all documents and fields than I did? I want it to return all the document matching the query and not just two or zero.
Any help would be greatly appreciated.
Elasticsearch uses a standard analyzer if no analyzer is specified. Since no analyzer is specified on "fileName", demo.txt gets tokenized to
{
"tokens": [
{
"token": "demo.txt",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Now when you are searching for demo it will not give any result, but searching for demo.txt will give the result.
You can instead use a wildcard query to search for a document having demo in fileName
{
"query": {
"wildcard": {
"fileName": {
"value": "demo*"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"fileName": "demo.pdf"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"fileName": "demo.txt"
}
}
]
Since revisionDate, projectSmirCreationDate, changedDate, dueDate are all of type date, so you cannot do a partial search on these dates.
You can use multi-fields, to add one more field (of text type) in the above fields. Modify your index mapping as shown below
{
"mappings": {
"properties": {
"changedDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"projectSmirCreationDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"dueDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"revisionDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
}
}
}
}
Index Data:
{
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
{
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
Search Query:
{
"query": {
"multi_match": {
"query": "2008"
}
}
}
Search Result:
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
}
]

Elasticsearch: index boost with completion suggester

Is it possible to use index boost when using completion suggester in Elasticsearch? I have tried many different ways but doesn't seem to work. Haven't found any reference in the documentation claiming that it does not work for completion suggester. Example:
POST index1,index2/_search
{
"suggest" : {
"name_suggest" : {
"text" : "my_query",
"completion" : {
"field" : "name_suggest",
"size" : 7,
"fuzzy" :{}
}
}
},
"indices_boost" : [
{ "index1" : 2 },
{ "index2" : 1.5 }
]
}
The above does not return boosted scores. The scores are the same compared to running it without the indices_boost parameter.
Tried few options but these didn't work directly, instead, you can define the weight of a document at index-time, and these could be used as a workaround to get the boosted document, below is the complete example.
Index mapping same for index1, index2
{
"mappings": {
"properties": {
"suggest": {
"type": "completion"
},
"title": {
"type": "keyword"
}
}
}
}
Index doc 1 with weight in index-1
{
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 30
}
}
Similar doc is inserted in index-2 with diff weight
{
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 10 --> note less weight
}
}
And the simple search will now sort it according to weight
{
"suggest": {
"song-suggest": {
"prefix": "nir",
"completion": {
"field": "suggest"
}
}
}
}
And search result
{
"text": "Nirvana",
"_index": "index-1",
"_type": "_doc",
"_id": "1",
"_score": 34.0,
"_source": {
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 30
}
}
},
{
"text": "Nirvana",
"_index": "index-2",
"_type": "_doc",
"_id": "1",
"_score": 30.0,
"_source": {
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 10
}
}
}
]

search first element of a multivalue text field in elasticsearch

I want to search first element of array in documents of elasticsearch, but I can't.
I don't find it that how can I search.
For test, I created new index with fielddata=true, but I still didn't get the response that I wanted
Document
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
Values
name : ["John", "Doe"]
My request
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": "doc['name'][0]=params.param1",
"params" : {
"param1" : "john"
}
}
}
}
}
}
}
Incoming Response
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
You can use the following script that is used in a search request to return a scripted field:
{
"script_fields": {
"firstElement": {
"script": {
"lang": "painless",
"inline": "params._source.name[0]"
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64391432",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"firstElement": [
"John" <-- note this
]
}
}
]
You can use a Painless script to create a script field to return a customized value for each document in the results of a query.
You need to use equality equals operator '==' to COMPARE two
values where the resultant boolean type value is true if the two
values are equal and false otherwise in the script query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"name":{
"type":"text",
"fielddata":true
}
}
}
}
Index data:
{
"name": [
"John",
"Doe"
]
}
Search Query:
{
"script_fields": {
"my_field": {
"script": {
"lang": "painless",
"source": "params['_source']['name'][0] == params.params1",
"params": {
"params1": "John"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"my_field": [
true <-- note this
]
}
}
]
Arrays of objects do not work as you would expect: you cannot query
each object independently of the other objects in the array. If you
need to be able to do this then you should use the nested data type
instead of the object data type.
You can use the script as shown in my another answer if you want to just compare the value of the first element of the array to some other value. But based on your comments, it looks like your use case is quite different.
If you want to search the first element of the array you need to convert your data, into nested form. Using arrays of object at search time you can’t refer to “the first element” or “the last element”.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"name": {
"type": "nested"
}
}
}
}
Index Data:
{
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
{
"booking_id": 1,
"name": [
{
"first": "Adam Simith",
"second": "John Doe"
}
]
}
{
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
Search Query:
{
"query": {
"nested": {
"path": "name",
"query": {
"bool": {
"must": [
{
"match_phrase": {
"name.first": "John Doe"
}
}
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9400072,
"_source": {
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0.9400072,
"_source": {
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
}
]

Filter elastic search data when fields contain ~

I have bunch of documents like below. I want to filter the data where projectkey starts with ~.
I did read some articles which says ~ is an operator in Elastic query so cannot really filter with that.
Can someone help to form the search query for /branch/_search API ??
{
"_index": "branch",
"_type": "_doc",
"_id": "GAz-inQBJWWbwa_v-l9e",
"_version": 1,
"_score": null,
"_source": {
"branchID": "refs/heads/feature/12345",
"displayID": "feature/12345",
"date": "2020-09-14T05:03:20.137Z",
"projectKey": "~user",
"repoKey": "deploy",
"isDefaultBranch": false,
"eventStatus": "CREATED",
"user": "user"
},
"fields": {
"date": [
"2020-09-14T05:03:20.137Z"
]
},
"highlight": {
"projectKey": [
"~#kibana-highlighted-field#user#/kibana-highlighted-field#"
],
"projectKey.keyword": [
"#kibana-highlighted-field#~user#/kibana-highlighted-field#"
],
"user": [
"#kibana-highlighted-field#user#/kibana-highlighted-field#"
]
},
"sort": [
1600059800137
]
}
UPDATE***
I used prerana's answer below to use -prefix in my query
Something is still wrong when i use prefix and range - i get below error - What am i missing ??
GET /branch/_search
{
"query": {
"prefix": {
"projectKey": "~"
},
"range": {
"date": {
"gte": "2020-09-14",
"lte": "2020-09-14"
}
}
}
}
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[prefix] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 6,
"col": 5
}
],
"type": "parsing_exception",
"reason": "[prefix] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 6,
"col": 5
},
"status": 400
}
If I understood your issue well, I suggest the creation of a custom analyzer to search the special character ~.
I did a test locally as follows while replacing ~ to __SPECIAL__ :
I created an index with a custom char_filter alongside with the addition of a field to the projectKey field. The name of the new multi_field is special_characters.
Here is the mapping:
PUT wildcard-index
{
"settings": {
"analysis": {
"char_filter": {
"special-characters-replacement": {
"type": "mapping",
"mappings": [
"~ => __SPECIAL__"
]
}
},
"analyzer": {
"special-characters-analyzer": {
"tokenizer": "standard",
"char_filter": [
"special-characters-replacement"
]
}
}
}
},
"mappings": {
"properties": {
"projectKey": {
"type": "text",
"fields": {
"special_characters": {
"type": "text",
"analyzer": "special-characters-analyzer"
}
}
}
}
}
}
Then I ingested the following contents in the index:
"projectKey": "content1 ~"
"projectKey": "This ~ is a content"
"projectKey": "~ cars on the road"
"projectKey": "o ~ngram"
Then, the query was:
GET wildcard-index/_search
{
"query": {
"match": {
"projectKey.special_characters": "~"
}
}
}
The response was:
"hits" : [
{
"_index" : "wildcard-index",
"_type" : "_doc",
"_id" : "h1hKmHQBowpsxTkFD9IR",
"_score" : 0.43250346,
"_source" : {
"projectKey" : "content1 ~"
}
},
{
"_index" : "wildcard-index",
"_type" : "_doc",
"_id" : "iFhKmHQBowpsxTkFFNL5",
"_score" : 0.3034693,
"_source" : {
"projectKey" : "This ~ is a content"
}
},
{
"_index" : "wildcard-index",
"_type" : "_doc",
"_id" : "-lhKmHQBowpsxTkFG9Kg",
"_score" : 0.3034693,
"_source" : {
"projectKey" : "~ cars on the road"
}
}
]
Please let me know If you have any issue, I will be glad to help you.
Note: This method works if there is a blank space after the ~. You can see from the response that the 4th data was not displayed.
while #hansley answer would work, but it requires you to create a custom analyzer and still as you mentioned you want to get only the docs which starts with ~ but in his result I see all the docs containing ~, so providing my answer which requires very less configuration and works as required.
Index mapping default, so just index below docs and ES will create a default mapping with .keyword field for all text field
Index sample docs
{
"title" : "content1 ~"
}
{
"title" : "~ staring with"
}
{
"title" : "in between ~ with"
}
Search query should fetch obly 2nd docs from sample docs
{
"query": {
"prefix" : { "title.keyword" : "~" }
}
}
And search result
"hits": [
{
"_index": "pre",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"title": "~ staring with"
}
}
]
Please refer prefix query for more info
Update 1:
Index Mapping:
{
"mappings": {
"properties": {
"date": {
"type": "date"
}
}
}
}
Index Data:
{
"date": "2015-02-01",
"title" : "in between ~ with"
}
{
"date": "2015-01-01",
"title": "content1 ~"
}
{
"date": "2015-02-01",
"title" : "~ staring with"
}
{
"date": "2015-02-01",
"title" : "~ in between with"
}
Search Query:
{
"query": {
"bool": {
"must": [
{
"prefix": {
"title.keyword": "~"
}
},
{
"range": {
"date": {
"lte": "2015-02-05",
"gte": "2015-01-11"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "stof_63924930",
"_type": "_doc",
"_id": "2",
"_score": 2.0,
"_source": {
"date": "2015-02-01",
"title": "~ staring with"
}
},
{
"_index": "stof_63924930",
"_type": "_doc",
"_id": "4",
"_score": 2.0,
"_source": {
"date": "2015-02-01",
"title": "~ in between with"
}
}
]

Elastic Search- Fetch Distinct Tags

I have document of following format:
{
_id :"1",
tags:["guava","apple","mango", "banana", "gulmohar"]
}
{
_id:"2",
tags: ["orange","guava", "mango shakes", "apple pie", "grammar"]
}
{
_id:"3",
tags: ["apple","grapes", "water", "gulmohar","water-melon", "green"]
}
Now, I want to fetch unique tags value from whole document 'tags field' starting with prefix g*, so that these unique tags will be display by tag suggestors(Stackoverflow site is an example).
For example: Whenever user types, 'g':
"guava", "gulmohar", "grammar", "grapes" and "green" should be returned as a result.
ie. the query should returns distinct tags with prefix g*.
I tried everywhere, browse whole documentations, searched es forum, but I didn't find any clue, much to my dismay.
I tried aggregations, but aggregations returns the distinct count for whole words/token in tags field. It does not return the unique list of tags starting with 'g'.
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"allow_leading_wildcard": false,
"fields": [
"tags"
],
"query": "g*",
"fuzziness":0
}
}
]
}
},
"filter": {
//some condition on other field...
}
}
},
"aggs": {
"distinct_tags": {
"terms": {
"field": "tags",
"size": 10
}
}
},
result of above: guava(w), apple(q), mango(1),...
Can someone please suggest me the correct way to fetch all the distinct tags with prefix input_prefix*?
It's a bit of a hack, but this seems to accomplish what you want.
I created an index and added your docs:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"tags":["guava","apple","mango", "banana", "gulmohar"]}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"tags": ["orange","guava", "mango shakes", "apple pie", "grammar"]}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"tags": ["guava","apple","grapes", "water", "grammar","gulmohar","water-melon", "green"]}
Then I used a combination of prefix query and highlighting as follows:
POST /test_index/_search
{
"query": {
"prefix": {
"tags": {
"value": "g"
}
}
},
"fields": [ ],
"highlight": {
"pre_tags": [""],
"post_tags": [""],
"fields": {
"tags": {}
}
}
}
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"highlight": {
"tags": [
"guava",
"gulmohar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grammar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grapes",
"grammar",
"gulmohar",
"green"
]
}
}
]
}
}
Here is the code I used:
http://sense.qbox.io/gist/c14675ee8bd3934389a6cb0c85ff57621a17bf11
What you're trying to do amounts to autocomplete, of course, and there are perhaps better ways of going about that than what I posted above (though they are a bit more involved). Here are a couple of blog posts we did about ways to set up autocomplete:
http://blog.qbox.io/quick-and-dirty-autocomplete-with-elasticsearch-completion-suggest
http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
As per #Sloan Ahrens advice, I did following:
Updated the mapping:
"tags": {
"type": "completion",
"context": {
"filter_color": {
"type": "category",
"default": "",
"path": "fruits.color"
},
"filter_type": {
"type": "category",
"default": "",
"path": "fruits.type"
}
}
}
Reference: ES API Guide
Inserted these indexes:
{
_id :"1",
tags:{input" :["guava","apple","mango", "banana", "gulmohar"]},
fruits:{color:'bar',type:'alice'}
}
{
_id:"2",
tags:{["orange","guava", "mango shakes", "apple pie", "grammar"]}
fruits:{color:'foo',type:'bob'}
}
{
_id:"3",
tags:{ ["apple","grapes", "water", "gulmohar","water-melon", "green"]}
fruits:{color:'foo',type:'alice'}
}
I don't need to modify much, my original index. Just added input before tags array.
POST rescu1/_suggest?pretty'
{
"suggest": {
"text": "g",
"completion": {
"field": "tags",
"size": 10,
"context": {
"filter_color": "bar",
"filter_type": "alice"
}
}
}
}
gave me the desired output.
I accepted #Sloan Ahrens answer as his suggestions worked like a charm for me, and he showed me the right direction.

Resources