Elasticsearch, term suggester returns - elasticsearch

I can't make term suggester work.
Here are my setups.
'name_not_analyzed': {
'type': 'string',
"index": "not_analyzed"
},
'suggest': {
'type': 'completion',
'analyzer': "simple",
'search_analyzer': 'simple',
'payloads': 'yes'
}
And here are my requests.
** Term suggester doesn't work..
GET /reviewmeta_index/_suggest
{
"my" : {
"text" : "dd",
"term" : {
"field" : "name_not_analyzed"
}
}
}
** completion suggester works..
GET /reviewmeta_index/_suggest
{
"product_suggest":{
"text":"dd",
"completion": {
"field" : "suggest"
}
}
}
Documentation on how I should set up for term suggester to work is sparse..

Completion Suggester is for Autocomplete feature so query like
{
"name_suggest":{
"text":"d",
"completion": {
"field" : "suggest"
}
}
}
will give you something like
"options": [
{
"text": "donald",
"score": 8
},
{
"text": "david",
"score": 7
}
]
while term suggester is for spell checking and finding similar terms, so you need to query like
{
"my-suggestion": {
"text": "davi",
"term": {
"field": "name_not_analyzed",
"size" : 10
}
}
}
which will give you something like this
"options": [
{
"text": "dave",
"score": 0.8333333,
"freq": 11
},
{
"text": "david",
"score": 0.6666666,
"freq": 6
}
]
I use term suggester for "Did you mean" feature when user gets zero results. More options for term suggester to tweak.
EDIT 1: Added min_word_length option
since your text is of only 2 characters and because default value of max_edits is 2 and default value of min_word_length is 4, you are not getting any results.
You need to add min_word_length option to your query
GET /reviewmeta_index/_suggest
{
"my" : {
"text" : "dd",
"term" : {
"field" : "name_not_analyzed",
"min_word_length" : 2
}
}
}
The above query will give you suggestions like "do","did" but wont give you "DO","Did" as you have index : not_analyzed on the field.
Note: You can not increase max_edits to more than 2 which is default.
The algorithm used by ES to calculate edit distance.

Related

How to get the best matching document in Elasticsearch?

I have an index where I store all the places used in my documents. I want to use this index to see if the user mentioned one of the places in the text query I receive.
Unfortunately, I have two documents whose name is similar enough to trick Elasticsearch scoring: Stockholm and Stockholm-Arlanda.
My test phrase is intyg stockholm and this is the query I use to get the best matching document.
{
"size": 1,
"query": {
"bool": {
"should": [
{
"match": {
"name": "intyig stockholm"
}
}
],
"must": [
{
"term": {
"type": {
"value": "4"
}
}
},
{
"terms": {
"name": [
"intyg",
"stockholm"
]
}
},
{
"exists": {
"field": "data.coordinates"
}
}
]
}
}
}
As you can see, I use a terms query to find the interesting documents and I use a match query in the should part of the root bool query to use scoring to get the document I want (Stockholm) on top.
This code worked locally (where I run ES in a container) but it broke when I started testing on a cluster hosted in AWS (where I have the exact same dataset). I found this explaining what happens and adding the search type argument actually fixes the issue.
Since the workaround is best not used on production, I'm looking for ways to have the expected result.
Here are the two documents:
// Stockholm
{
"type" : 4,
"name" : "Stockholm",
"id" : "42",
"searchableNames" : [
"Stockholm"
],
"uniqueId" : "Place:42",
"data" : {
"coordinates" : "59.32932349999999,18.0685808"
}
}
// Stockholm-Arlanda
{
"type" : 4,
"name" : "Stockholm-Arlanda",
"id" : "1832",
"searchableNames" : [
"Stockholm-Arlanda"
],
"uniqueId" : "Place:1832",
"data" : {
"coordinates" : "59.6497622,17.9237807"
}
}

Get the number of appearances of a particular term in an elasticsearch field

I have an elasticsearch index (posts) with following mappings:
{
"id": "integer",
"title": "text",
"description": "text"
}
I want to simply find the number of occurrences of a particular term inside the description field for a single particular document (i have the document id and term to find).
e.g i have a post like this {id: 123, title:"some title", description: "my city is LA, this post description has two occurrences of word city "}.
I have the the document id/ post id for this post, just want to find how many times word "city" appears in the description for this particular post. (result should be 2 in this case)
Cant seem to find the way for this search, i don't want the occurrences across ALL the documents but just for a single document and inside its' one field. Please suggest a query for this. Thanks
Elasticsearch Version: 7.5
You can use a terms aggregation on your description but need to make sure its fielddata is set to true on it.
PUT kamboh/
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"title": {
"type": "text"
},
"description": {
"type": "text",
"fields": {
"simple_analyzer": {
"type": "text",
"fielddata": true,
"analyzer": "simple"
},
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Ingesting a sample doc:
PUT kamboh/_doc/1
{
"id": 123,
"title": "some title",
"description": "my city is LA, this post description has two occurrences of word city "
}
Aggregating:
GET kamboh/_search
{
"size": 0,
"aggregations": {
"terms_agg": {
"terms": {
"field": "description.simple_analyzer",
"size": 20
}
}
}
}
Yielding:
"aggregations" : {
"terms_agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "city",
"doc_count" : 1
},
{
"key" : "description",
"doc_count" : 1
},
...
]
}
}
Now, as you can see, the simple analyzer split the string into words and made them lowercase but it also got rid of the duplicate city in your string! I could not come up with an analyzer that'd keep the duplicates... With that being said,
It's advisable to do these word counts before you index!
You would split your string by whitespace and index them as an array of words instead of a long string.
This is also possible at search time, albeit it's very expensive, does not scale well and you need to have script.painless.regex.enabled: true in your es.yaml:
GET kamboh/_search
{
"size": 0,
"aggregations": {
"terms_script": {
"scripted_metric": {
"params": {
"word_of_interest": ""
},
"init_script": "state.map = [:];",
"map_script": """
if (!doc.containsKey('description')) return;
def split_by_whitespace = / /.split(doc['description.keyword'].value);
for (def word : split_by_whitespace) {
if (params['word_of_interest'] !== "" && params['word_of_interest'] != word) {
return;
}
if (state.map.containsKey(word)) {
state.map[word] += 1;
return;
}
state.map[word] = 1;
}
""",
"combine_script": "return state.map;",
"reduce_script": "return states;"
}
}
}
}
yielding
...
"aggregations" : {
"terms_script" : {
"value" : [
{
"occurrences" : 1,
"post" : 1,
"city" : 2, <------
"LA," : 1,
"of" : 1,
"this" : 1,
"description" : 1,
"is" : 1,
"has" : 1,
"my" : 1,
"two" : 1,
"word" : 1
}
]
}
}
...

Custom highlights in elastic search

I am a new bie to elastic search. I have a task where I have to highlight certain queries with specific tags.
I am using a similar query mentioned in elastic search intervals. The problem now is I have to highlight "my favourite food" with a html tag,say "favorite" and cold porridge / hot water with a different html tag, say "state".
How I can do that.
POST _search
{
"query": {
"intervals" : {
"my_text" : {
"all_of" : {
"ordered" : true,
"intervals" : [
{
"match" : {
"query" : "my favourite food",
"max_gaps" : 0,
"ordered" : true
}
},
{
"any_of" : {
"intervals" : [
{ "match" : { "query" : "hot water" } },
{ "match" : { "query" : "cold porridge" } }
]
}
}
]
},
"boost" : 2.0,
"_name" : "favourite_food"
}
}
}
}
You can use the Highlighting feature in Elasticsearch as follows:
GET /index_name/_search
{
"query": {},
"highlight": {
"fields": {
"content": {
"type": "unified",
"number_of_fragments": 0,
"pre_tags": [
"<first_filter>",
"<second_filter>",
"<third filter>"
],
"post_tags": [
"<first_filter>",
"<second_filter>",
"<third filter>"
]
}
}
}
}
The order in which the tags are applied depends on the order in which the filters applied. Also note that, applying number_of_fragments:0 returns the entire content with the tagged hits.

ElasticSearch 5.x context suggester with multiple contexts

I want to use the context suggester from elasticSearch, but my suggestion results need to match 2 context values.
Expanding the example from the docs, i want to do something like:
POST place/_search?pretty
{
"suggest": {
"place_suggestion" : {
"prefix" : "tim",
"completion" : {
"field" : "suggest",
"size": 10,
"contexts": {
"place_type": [ "cafe", "restaurants" ],
"rating": ["good"]
}
}
}
}
}
I would like to have results that have a context 'cafe' or 'restaurant' for place_type AND that have the context 'good' for rating.
When I try something like this, elastic performs an OR operation on the contexts, giving me all suggestions with the context 'cafe', restaurant' OR 'good'.
Can I somehow specify what BOOL operator elastic needs to use for combining multiple contexts?
It looks like this functionality isn't supported from Elasticsearch 5.x onwards:
https://github.com/elastic/elasticsearch/issues/21291#issuecomment-375690371
Your best bet is to create a composite context, which seems to be how Elasticsearch 2.x achieved multiple contexts in a query:
https://github.com/elastic/elasticsearch/pull/26407#issuecomment-326771608
To do this, I guess you'll need a new field in your mapping. Let's call it cat-rating:
PUT place
{
"mappings": {
"properties": {
"suggest": {
"type": "completion",
"contexts": [
{
"name": "place_type-rating",
"type": "category",
"path": "cat-rating"
}
]
}
}
}
}
When you index new documents you'll need to concantenate the fields place_type and rating together, separated by -, for the cat-rating field.
Once that's done your query will need to look something like this:
POST place/_search?pretty
{
"suggest": {
"place_suggestion": {
"prefix": "tim",
"completion": {
"field": "suggest",
"size": 10,
"contexts": {
"place_type-rating": [
{
"context": "cafe-good"
},
{
"context": "restaurant-good"
}
]
}
}
}
}
}
That'll return suggestions of good cafe's OR good restaurants.

Elastic Search multilingual field

I have read through few articles and advices, but unfortunately I haven't found working solution for me.
The problem is I have a field in index that can have content in any possible language and I don't know in which language it is. I need to search and sort on it. It is not localisation, just values in different languages.
The first language (excluding few European) I have tried it on was Japanese. For the beginning I set for this field only one analyzer and tried to search only for Japanese words/phrases. I took example from here. Here is what I used for this:
'analysis': {
"filter": {
...
"ja_pos_filter": {
"type": "kuromoji_part_of_speech",
"stoptags": [
"\\u52a9\\u8a5e-\\u683c\\u52a9\\u8a5e-\\u4e00\\u822c",
"\\u52a9\\u8a5e-\\u7d42\\u52a9\\u8a5e"]
},
...
},
"analyzer": {
...
"ja_analyzer": {
"type": "custom",
"filter": ["kuromoji_baseform", "ja_pos_filter", "icu_normalizer", "icu_folding", "cjk_width"],
"tokenizer": "kuromoji_tokenizer"
},
...
},
"tokenizer": {
"kuromoji": {
"type": "kuromoji_tokenizer",
"mode": "search"
}
}
}
Mapper:
'name': {
'type': 'string',
'index': 'analyzed',
'analyzer': 'ja_analyzer',
}
And here are few tries to get result from it:
{
'filter': {
'query': {
'bool': {
'must': [
{
# 'wildcard': {'name': u'*ネバーランド福島*'}
# 'match': {'name": u'ネバーランド福島'
# },
"query_string": {
"fields": ['name'],
"query": u'ネバーランド福島',
"default_operator": 'AND'
}
},
],
'boost': 1.0
}
}
}
}
None of them works.
If I just take a standard analyser and query in with query_string or brake phrase myself (breaking on whitespace, what i don't have here) and use wildcard *<>* for this it will find me nothing again. Analyser says that ネバーランド and 福島 are separate words/parts:
curl -XPOST 'http://localhost:9200/test/_analyze?analyzer=ja_analyzer&pretty' -d 'ネバーランド福島'
{
"tokens" : [ {
"token" : "ネハラント",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "福島",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 2
} ]
}
And in case of standard analyser I'll get result if I'll look for ネバーランド I'll get what I want. But if I use customised analyser and try the same or just one symbol I'm still getting nothing.
The behaviour I'm looking for is: breaking query string on words/parts, all words/parts should be present in resulting name field.
Thank you in advance

Resources