how to exclude search words in synonyms filter in elasticsearch

how to exclude search words in synonyms filter in elasticsearch - elasticsearch

While I'm adding table and tables as synonym filter in elastic search, I need to filter out the results for table fan. How to achieve this in elastic search
Could we build a taxonomy of inclusion and exclusion lists filters in settings rather than at run time queries in elastic search

GET <indexName>/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"<fieldName>": {
"query": "table fan", // <======= Below operator will applied b/w table(&synonyms) And fan(&synonyms)
"operator": "AND"
}
}
}
]
}
}
}
You can use above query to exclude all the documents having both 'table', 'fan' and their corresponding synonyms.
OR:
If you want to play with multiple logical operators. e.g Given me all the documents which doesn't contain either "table fan" Or "ac" you can use simple_query_string
GET <indexName>/_search
{
"query": {
"bool": {
"must_not": [
{
"simple_query_string": {
"query": "(table + fan) | ac", // <=== '+'='and', '|'='or', '-'='not'
"fields": [
"<fieldName>" // <==== use multiple field names, wildcard also supported
]
}
}
]
}
}
}

Adding a working example with index data, mapping, search query and search result
Index Mapping:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"table, tables"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "synonym_analyzer",
"search_analyzer": "standard"
}
}
}
}
Analyze API
POST/_analyze
{
"analyzer" : "synonym_analyzer",
"text" : "table fan"
}
The following tokens are generated:
{
"tokens": [
{
"token": "table",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "tables",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 0
},
{
"token": "fan",
"start_offset": 6,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Index Data:
{ "title": "table and fan" }
{ "title": "tables and fan" }
{ "title": "table fan" }
{ "title": "tables fan" }
{ "title": "table chair" }
Search Query:
{
"query": {
"bool": {
"must": {
"match": {
"title": "table"
}
},
"filter": {
"bool": {
"must_not": [
{
"match_phrase": {
"title": "table fan"
}
},
{
"match_phrase": {
"title": "table and fan"
}
}
]
}
}
}
}
}
You can also use match query in place of match_phrase query
{
"query": {
"bool": {
"must": {
"match": {
"title": "table"
}
},
"filter": {
"bool": {
"must_not": [
{
"match": {
"title": {
"query": "table fan",
"operator": "AND"
}
}
}
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "synonym",
"_type": "_doc",
"_id": "2",
"_score": 0.06783115,
"_source": {
"title": "table chair"
}
}
]
Update 1:
Could we build a taxonomy of inclusion and exclusion lists filters in
settings rather than at run time queries in elastic search
Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.Refer this ES documentation on mapping to understand what mapping is used to define.
Please refer to this documentation on Dynamic template that allow you to define custom mappings that can be applied to dynamically added fields

Related

No match on document if the search string is longer than the search field

I have a title I am looking for
The title is, and is stored in a document as
"Police diaries : stefan zweig"
When I search "Police"
I get the result.
But when I search Policeman
I do not get the result.
Here is the query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"title",
omitted because irrelevance...
],
"query": "Policeman",
"fuzziness": "1.5",
"prefix_length": "2"
}
}
],
"must": {
omitted because irrelevance...
}
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
and here is the mapping
{
"books": {
"mappings": {
"book": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"sort": {
"type": "text",
"analyzer": "to order in another language, (creates a string with symbols)",
"fielddata": true
}
}
}
}
}
}
}
}
It should be noted that I have documents with a title "some title"
which get hits if I search for "someone title".
I cant figure out why the police book is not showing up.

So you have 2 parts of your question.
You want to search the title containing police when searching for policeman.
want to know why some title documents match the someone title document and according to that you expect the first one to match as well.
Let me first explain you why second query matches and the why the first one doesn't and then would tell you, how to make the first one to work.
Your document containing some title creates below tokens and you can verify this with analyzer API.
POST /_analyze
{
"text": "some title",
"analyzer" : "standard" --> default analyzer for text field
}
Generated tokens
{
"tokens": [
{
"token": "some",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "title",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Now when you search for someone title using the match query which is analyzed and uses the same analyzer which is used on index time on field.
So it creates 2 tokens someone and title and match query matches the title tokens, which is the reason it comes in your search result, you can also use Explain API to verify and see the internals how it matches in detail.
How to bring police title when searching for policeman
You need to make use of synonyms token filter as shown in the below example.
Index Def
{
"settings": {
"analysis": {
"analyzer": {
"synonyms": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms" : ["policeman => police"] --> note this
}
}
}
},
"mappings": {
"properties": {
"": {
"type": "text",
"analyzer": "synonyms"
}
}
}
}
Index sample doc
{
"dialog" : "police"
}
Search query having term policeman
{
"query": {
"match" : {
"dialog" : {
"query" : "policeman"
}
}
}
}
And search result
"hits": [
{
"_index": "so_syn",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"dialog": "police" --> note source has `police` only.
}
}
]

Position as result, instead of highlighting

I try to get positions instead of highlighted text as the result of elasticsearch query.
Create the index:
PUT /test/
{
"mappings": {
"article": {
"properties": {
"text": {
"type": "text",
"analyzer": "english"
},
"author": {
"type": "text"
}
}
}
}
}
Put a document:
PUT /test/article/1
{
"author": "Just Me",
"text": "This is just a simple test to demonstrate the audience the purpose of the question!"
}
Search the document:
GET /test/article/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"text": {
"query": "simple test",
"_name": "must"
}
}
}
],
"should": [
{
"match_phrase": {
"text": {
"query": "need help",
"_name": "first",
"slop": 2
}
}
},
{
"match_phrase": {
"text": {
"query": "purpose question",
"_name": "second",
"slop": 3
}
}
},
{
"match_phrase": {
"text": {
"query": "don't know anything",
"_name": "third"
}
}
}
],
"minimum_should_match": 1
}
},
"highlight": {
"fields": {
"text": {}
}
}
}
When i run this search, i get the result like so:
This is just a simple test to <em>demonstrate</em> the audience the purpose of the <em>question</em>!
I'm not interested in getting the results surrounded with em tags, but i want to get all the positions of the results like so:
"hits": [
{ "start_offset": 30, "end_offset": 40 },
{ "start_offset": 74, "end_offset": 81 }
]
Hope you get my idea!

To have the offset position of a word in a text you should add to your index mapping a termvector - doc here . As written in the doc, you have to enable this param at index time:
"term_vector": "with_positions_offsets_payloads"
For the specific query, please follow the linked doc page

Elastic synonym usage in aggregations

Situation :
Elastic version used: 2.3.1
I have an elastic index configured like so
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"british,english",
"queen,monarch"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
}
}
Which is great, when I query the document and use a query term "english" or "queen" I get all documents matching british and monarch. When I use a synonym term in filter aggregation it doesnt work. For example
In my index I have 5 documents, 3 of them have monarch, 2 of them have queen
POST /my_index/_search
{
"size": 0,
"query" : {
"match" : {
"status.synonym":{
"query": "queen",
"operator": "and"
}
}
},
"aggs" : {
"status_terms" : {
"terms" : { "field" : "status.synonym" }
},
"monarch_filter" : {
"filter" : { "term": { "status.synonym": "monarch" } }
}
},
"explain" : 0
}
The result produces:
Total hits:
5 doc count (as expected, great!)
Status terms: 5 doc count for queen (as expected, great!)
Monarch filter: 0 doc count
I have tried different synonym filter configuration:
queen,monarch
queen,monarch => queen
queen,monarch => queen,monarch
But the above hasn't changed the results. I was wanting to conclude that maybe you can use filters at query time only but then if terms aggregation is working why shouldn't filter, hence I think its my synonym filter configuration that is wrong. A more extensive synonym filter example can be found here.
QUESTION:
How to use/configure synonyms in filter aggregation?
Example to replicate the case above:
1. Create and configure index:
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"wlh,wellhead=>wellwell"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
}
}
PUT my_index/_mapping/job
{
"properties": {
"title":{
"type": "string",
"analyzer": "my_synonyms"
}
}
}
2.Put two documents:
PUT my_index/job/1
{
"title":"wellhead smth else"
}
PUT my_index/job/2
{
"title":"wlh other stuff"
}
3.Execute a search on wlh which should return 2 documents; have a terms aggregation which should have 2 documents for wellwell and a filter which shouldn't have 0 count:
POST my_index/_search
{
"size": 0,
"query" : {
"match" : {
"title":{
"query": "wlh",
"operator": "and"
}
}
},
"aggs" : {
"wlhAggs" : {
"terms" : { "field" : "title" }
},
"wlhFilter" : {
"filter" : { "term": { "title": "wlh" } }
}
},
"explain" : 0
}
The results of this query is:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"wlhAggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "wellwell",
"doc_count": 2
},
{
"key": "else",
"doc_count": 1
},
{
"key": "other",
"doc_count": 1
},
{
"key": "smth",
"doc_count": 1
},
{
"key": "stuff",
"doc_count": 1
}
]
},
"wlhFilter": {
"doc_count": 0
}
}
}
And thats my problem, the wlhFilter should have at least 1 doc count in it.

I'm short in time, so if needed I can elaborate a bit more at a later time today/tomorrow. But the following should work:
DELETE /my_index
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"british,english",
"queen,monarch"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_synonyms",
"fielddata": true
}
}
}
}
}
POST my_index/test/1
{
"title" : "the british monarch"
}
GET my_index/_search
{
"query": {
"match": {
"title": "queen"
}
}
}
GET my_index/_search
{
"query": {
"match": {
"title": "queen"
}
},
"aggs": {
"queen_filter": {
"filter": {
"term": {
"title": "queen"
}
}
},
"monarch_filter": {
"filter": {
"term": {
"title": "monarch"
}
}
}
}
}
Could you share the mapping you have defined for your status.synonym field?
EDIT: V2
The reason why your filter's output is 0, is because a filter in Elasticsearch never goes through an analysis phase. It's meant for exact matches.
The token 'wlh' in your aggregation will not be translated to 'wellwell', meaning that it doesn't occur in the inverted index. This is because, during index time, your 'wlh' is translated into 'wellwell'.
In order to achieve what you want, you will have to index the data into a separate field and adjust your filter accordingly.
You could try something like:
DELETE my_index
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"wlh,wellhead=>wellwell"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
},
"mappings": {
"job": {
"properties": {
"title": {
"type": "string",
"fields": {
"synonym": {
"type": "string",
"analyzer": "my_synonyms"
}
}
}
}
}
}
}
PUT my_index/job/1
{
"title":"wellhead smth else"
}
PUT my_index/job/2
{
"title":"wlh other stuff"
}
POST my_index/_search
{
"size": 0,
"query": {
"match": {
"title.synonym": {
"query": "wlh",
"operator": "and"
}
}
},
"aggs": {
"wlhAggs": {
"terms": {
"field": "title.synonym"
}
},
"wlhFilter": {
"filter": {
"term": {
"title": "wlh"
}
}
}
}
}
Output:
{
"aggregations": {
"wlhAggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "wellwell",
"doc_count": 2
},
{
"key": "else",
"doc_count": 1
},
{
"key": "other",
"doc_count": 1
},
{
"key": "smth",
"doc_count": 1
},
{
"key": "stuff",
"doc_count": 1
}
]
},
"wlhFilter": {
"doc_count": 1
}
}
}
Hope this helps!!

So with the help of #Byron Voorbach below and his comments this is my solution:
I have created a separate field which I use synonym analyser on, as
opposed to having a property field (mainfield.property).
And most importantly the problem was my synonyms were contracted! I
had, for example, british,english => uk. Changing that to
british,english,uk solved my issue and the filter aggregation is
returning the right number of documents.
Hope this helps someone, or at least point to the right direction.
Edit:
Oh lord praise the documentation! I completely fixed my issue with Filters (S!) aggregation (link here). In filters configuration I specified Match type of query and it worked! Ended up with something like this:
"aggs" : {
"messages" : {
"filters" : {
"filters" : {
"status" : { "match" : { "cats.saurus" : "monarch" }},
"country" : { "match" : { "cats.saurus" : "british" }}
}
}
}
}

ElasticSearch: How to use edge_ngram and have real relevant hits to display first

I'm new with elasticsearch and I'm trying to develop a search for an ecommerce to suggested 5~10 matching products to the user.
As it should work while the user is typing, we found in the official documentation the use of edge_ngram and it KIND OF worked. But as we searched to test, the results were not the expected. As shows the example below (in our test)
Searching example
As it is shown in the image, the result for the term "Furadeira" (Power Drill) returns accessories before the power drill itself. How can I enhance the results? Even the order where the match is found in the string would help me, I guess.
So, this is the code I have until now:
//PUT example
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"portuguese_stop": {
"type": "stop",
"stopwords": "_portuguese_"
},
"portuguese_stemmer": {
"type": "stemmer",
"language": "light_portuguese"
}
},
"analyzer": {
"portuguese": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"portuguese_stop",
"portuguese_stemmer"
]
},
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
/* mapping */
//PUT /example/products/_mapping
{
"products": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
/* Search */
//GET /example/products/_search
{
"query" : {
"query_string": {
"query" : "furadeira",
"type" : "most_fields", // Tried without this aswell
"fields" : [
"name^8",
"model^10",
"manufacturer^4",
"description"
]
}
}
}
/* Product example */
// PUT example/products/38313
{
"name": "FITA VEDA FRESTA (ESPUMA 4503) 12X5 M [ H0000164055 ]",
"description": "Caracteristicas do produto:Ve…Diminui ruidos indesejaveis.",
"price":21.90,
"product_id": 38313,
"image": "http://placehold.it/200x200",
"quantity": 92,
"width": 20.200,
"height": 1.500,
"length": 21.500,
"weight": 0.082,
"model": "167083",
"manufacturer": "3M DO BRASIL"
}
Thanks in advance.

you could enhance your query to be a so-called boolean query, which contains your existing query in a must clause, but have an additional query in a should clause, that matches exactly (not using the ngrammed field). If the query matches the should clause it will be scored higher.
See the bool query documentation.

let's assume you have a field that differentiates the Main product from Accessories. I call it level_field.
now you can have two approaches to go:
1) boost up The Main product _score by adding 'should' operation:
put your main query in the must operation and in should operation use level_field to boost the _score of documents which are the Main products.
{
"query": {
"bool": {
"must": {
"match": {
"name": {
"query": "furadeira"
}
}
},
"should": [
{ "match": {
"level_field": {
"query": "level1",
"boost": 3
}
}},
{ "match": {
"level_field": {
"query": "level2",
"boost": 2
}
}}
]
}
}
}
2) in second approach you can decrease _score for documents that they are not the Main products by using boosting query:
{
"query": {
"boosting": {
"positive": {
"query_string": {
"query" : "furadeira",
"type" : "most_fields",
"fields" : [
"name^8",
"model^10",
"manufacturer^4",
"description"
]
}
}
},
"negative": {
"term": {
"level_field": {
"value": "level2"
}
}
},
"negative_boost": 0.2
}
}
}
I hope it helps

Nested filtering in elasticsearch with more than one term of the same nested type

I'm new to elasticsearch, so maybe my approach is plain wrong, but I want to make an index of recipes and allow the user to filter it down with the aggregated ingredients that are still found in the subset.
Maybe I'm using the wrong language to explain so maybe this example will clarify. I would like to search for recipes with the term salt; which results in three recipes:
with ingredients: salt, flour, water
with ingredients: salt, pepper, egg
with ingredients: water, flour, egg, salt
The aggregate on the results ingredients returns salt, flour, water, pepper, egg. When I filter with flour I only want recipe 1 and 3 to appear in the search results (and the aggregate on ingredients should only return salt, flour, water, egg and salt). When I add another filter egg I want only recipe 3 to appear (and the aggregate should only return water, flour, egg, salt).
I can't make the latter to work: one filter next to the default query does narrow down the results as desired but when adding the other term (egg) to the terms filter the results again start to include b as well, as if it were an OR filter. Adding AND however to the filter execution results in NO results ... what am I doing wrong?
My mapping:
{
"recipe": {
"properties": {
"title": {
"analyzer": "dutch",
"type": "string"
},
"ingredients": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"analyzer": "dutch",
"include_in_parent": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
My query:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"match": {
"_all": "salt"
}
}
]
}
},
"filter": {
"nested": {
"path": "ingredients",
"filter": {
"terms": {
"ingredients.name": [
"flour",
"egg"
],
"execution": "and"
}
}
}
}
}
},
"size": 50,
"aggregations": {
"ingredients": {
"nested": {
"path": "ingredients"
},
"aggregations": {
"count": {
"terms": {
"field": "ingredients.name.raw"
}
}
}
}
}
}

Why are you using a nested mapping here? Its main purpose is to keep relations between the sub-object attributes, but your ingredients field has just one attribute and can be modeled simply as a string field.
So, if you update your mapping like this :
POST recipes
{
"mappings": {
"recipe": {
"properties": {
"title": {
"type": "string"
},
"ingredients": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
You can still index your recipes as :
{
"title":"recipe b",
"ingredients":["salt","pepper","egg"]
}
And this query gives you the result you are waiting for :
POST recipes/recipe/_search
{
"query": {
"filtered": {
"query": {
"match": {
"_all": "salt"
}
},
"filter": {
"terms": {
"ingredients": [
"flour",
"egg"
],
"execution": "and"
}
}
}
},
"size": 50,
"aggregations": {
"ingredients": {
"terms": {
"field": "ingredients"
}
}
}
}
which is :
{
...
"hits": {
"total": 1,
"max_score": 0.22295055,
"hits": [
{
"_index": "recipes",
"_type": "recipe",
"_id": "PP195TTsSOy-5OweArNsvA",
"_score": 0.22295055,
"_source": {
"title": "recipe c",
"ingredients": [
"salt",
"flour",
"egg",
"water"
]
}
}
]
},
"aggregations": {
"ingredients": {
"buckets": [
{
"key": "egg",
"doc_count": 1
},
{
"key": "flour",
"doc_count": 1
},
{
"key": "salt",
"doc_count": 1
},
{
"key": "water",
"doc_count": 1
}
]
}
}
}
Hope this helps.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how to exclude search words in synonyms filter in elasticsearch - elasticsearch

While I'm adding table and tables as synonym filter in elastic search, I need to filter out the results for table fan. How to achieve this in elastic search Could we build a taxonomy of inclusion and exclusion lists filters in settings rather than at run time queries in elastic search

Related

No match on document if the search string is longer than the search field

Position as result, instead of highlighting

Elastic synonym usage in aggregations

ElasticSearch: How to use edge_ngram and have real relevant hits to display first

Nested filtering in elasticsearch with more than one term of the same nested type

Categories

Resources