Query Synomys ElasticSearch - elasticsearch

I create a new index with synonym in this way :
PUT /test_index2
{
"settings": {
"index" : {
"analysis" : {
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms" : [
"mezzo,centro"
]
}
}
}
}
}
}
When i try this query :
{
"query":
{
"multi_match":
{
"query": "centro",
"fields": ["content"],
"analyzer": "synonym"
}
}
}
Kibana gives me this error :
[multi_match] analyzer [synonym] not found
I'm not very experienced with elastic could you help me?

You need to create a custom analyzer that leverages your synonym filter
{
"settings": {
"index" : {
"analysis" : {
"analyzer": { <-- add this
"synonym_analyzer": {
"type": "custom",
"filter": ["synonym"],
"tokenizer": "keyword"
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms" : [
"mezzo,centro"
]
}
}
}
}
}
And then you can use it in your query
{
"query":
{
"multi_match":
{
"query": "centro",
"fields": ["content"],
"analyzer": "synonym_analyzer" <-- change this
}
}
}

Related

Elasticsearch Multi-Term Auto Completion

I'm trying to implement the Multi-Term Auto Completion that's presented here.
Filtering down to the correct documents works, but when aggregating the completion_terms they are not filtered to those that match the current partial query, but instead include all completion_terms from any matched documents.
Here are the mappings:
{
"mappings": {
"dynamic" : "false",
"properties" : {
"completion_ngrams" : {
"type" : "text",
"analyzer" : "completion_ngram_analyzer",
"search_analyzer" : "completion_ngram_search_analyzer"
},
"completion_terms" : {
"type" : "keyword",
"normalizer" : "completion_normalizer"
}
}
}
}
Here are the settings:
{
"settings" : {
"index" : {
"analysis" : {
"filter" : {
"edge_ngram" : {
"type" : "edge_ngram",
"min_gram" : "1",
"max_gram" : "10"
}
},
"normalizer" : {
"completion_normalizer" : {
"filter" : [
"lowercase",
"german_normalization"
],
"type" : "custom"
}
},
"analyzer" : {
"completion_ngram_search_analyzer" : {
"filter" : [
"lowercase"
],
"tokenizer" : "whitespace"
},
"completion_ngram_analyzer" : {
"filter" : [
"lowercase",
"edge_ngram"
],
"tokenizer" : "whitespace"
}
}
}
}
}
}
}
I'm then indexing data like this:
{
"completion_terms" : ["Hammer", "Fortis", "Tool", "2000"],
"completion_ngrams": "Hammer Fortis Tool 2000"
}
Finally, the autocomplete search looks like this:
{
"query": {
"bool": {
"must": [
{
"term": {
"completion_terms": "fortis"
}
},
{
"term": {
"completion_terms": "hammer"
}
},
{
"match": {
"completion_ngrams": "too"
}
}
]
}
},
"aggs": {
"autocomplete": {
"terms": {
"field": "completion_terms",
"size": 100
}
}
}
}
This correctly returns documents matching the search string "fortis hammer too", but the aggregations include ALL completion terms that are included in any of the matched documents, e.g. for the query above:
"buckets": [
{ "key": "fortis" },
{ "key": "hammer" },
{ "key": "tool" },
{ "key": "2000" },
]
Ideally, I'd expect
"buckets": [
{ "key": "tool" }
]
I could filter out the terms that are already covered by the search query ("fortis" and "hammer" in this case) in the app, but the "2000" doesn't make any sense from a user's perspective, because it doesn't partially match any of the provided search terms.
I understand why this is happening, but I can't think of a solution. Can anyone help?
try filters agg please
{
"query": {
"bool": {
"must": [
{
"term": {
"completion_terms": "fortis"
}
},
{
"term": {
"completion_terms": "hammer"
}
},
{
"match": {
"completion_ngrams": "too"
}
}
]
}
},
"aggs": {
"findOuthammerAndfortis": {
"filters": {
"filters": {
"fortis": {
"term": {
"completion_terms": "fortis"
}
},
"hammer": {
"term": {
"completion_terms": "hammer"
}
}
}
}
}
}
}

Elasticsearch preform "OR" search on groups of filtering criteria

Overview: I have a situation where within a single index I want to preform a search to return results based on 2 different sets of criteria. Imagine a scenario where where I have an index with a data structure like what is outlined below
I want to preform some sort of query that looks at different "blocks" of criteria. Meaning I want to search by both of the following categories in a single query (if possible):
Category One:
Distance / location
Public = true
--OR--
Category Two:
Distance / location
Public = false
category = "specific category"
(although this is not my exact scenario it is an illustration of what I am facing):
{
"mappings" : {
"properties" : {
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"completed" : {
"type" : "boolean"
},
"deleted" : {
"type" : "boolean"
},
"location" : {
"type" : "geo_point"
},
"public" : {
"type" : "boolean"
},
"uuid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
Please note: I am rather new to Elastic and would appreciate any help with this. I have attempted to search for this question but was not able to find what I was looking for. Please let me know if there is any missing information here I should include.
Sure, you can combine bool queries
POST test_user2737876/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"filter": [
{
"geo_distance": {
"distance": "200km",
"location": {
"lat": 41.12,
"lon": -71.34
}
}
},
{
"term": {
"public": false
}
}
]
}
},
{
"bool": {
"filter": [
{
"geo_distance": {
"distance": "200km",
"location": {
"lat": 41.12,
"lon": -71.34
}
}
},
{
"term": {
"category.keyword": "specific category"
}
},
{
"term": {
"public": false
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

Elasticsearch `Procter&Gamble` and `Procter & Gamble` issues

My task is:
* Make procter&gamble and procter & gamble produce the same results including score
* Make it universal, not via synonyms, as it can be any other Somehow&Somewhat
* Highlight procter&gamble or procter & gamble, not separate tokens if the phrase matches
* I want to use simple_query_stringas I allow search operators
* Make AT&T searchable as well
Here is my snippet. The problems that procter&gamble or procter & gamble searches produce different scores and this different documents as the result.
But the user expects the same result for procter&gamble or procter & gamble
DELETE /english_example
PUT /english_example
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": ["example"]
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"acronymns": {
"type": "word_delimiter_graph",
"catenate_all" : true,
"preserve_original":true
},
"acronymns_": {
"type": "word_delimiter_graph",
"catenate_all" : true,
"preserve_original":true
},
"custom_stop_words_filter": {
"type": "stop",
"ignore_case": true,
"stopwords": [ "t" ]
}
},
"analyzer": {
"default": {
"tokenizer": "whitespace",
"char_filter": [
"ampersand_filter"
],
"filter": [
"english_possessive_stemmer",
"lowercase",
"acronymns",
"flatten_graph",
"english_stop",
"custom_stop_words_filter",
"english_keywords",
"english_stemmer"
]
}
},
"char_filter": {
"ampersand_filter": {
"type": "pattern_replace",
"pattern": "(?=[^&]*)( {0,}& {0,})(?=[^&]*)",
"replacement": "_and_"
},
"ampersand_filter2": {
"type": "mapping",
"mappings": [
"& => _and_"
]
}
}
}
}
}
PUT /english_example/_bulk
{ "index" : { "_id" : "1" } }
{ "description" : "wi-fi AT&T BB&T Procter & Gamble, some\nOther $500 games with Peter's", "contents" : "Much text with somewhere I meet Procter or Gamble" }
{ "index" : { "_id" : "2" } }
{ "description" : "Procter & Gamble", "contents" : "Much text with somewhere I meet Procter and Gamble" }
{ "index" : { "_id" : "3" } }
{ "description" : "Procter&Gamble", "contents" : "Much text with somewhere I meet Procter & Gamble" }
{ "index" : { "_id" : "4" } }
{ "description" : "Come Procter&Gamble", "contents" : "Much text with somewhere I meet Procter&Gamble" }
{ "index" : { "_id" : "5" } }
{ "description" : "Tome Procter & Gamble", "contents" : "Much text with somewhere I don't meet AT&T" }
# "query": "procter & gamble",
GET english_example/_search
{
"query": {
"simple_query_string": {
"query": "procter & gamble",
"default_operator": "or",
"fields": [
"description^2",
"contents^80"
]
}
},
"highlight": {
"fields": {
"description": {},
"contents": {}
}
}
}
# "query": "procter&gamble",
GET english_example/_search
{
"query": {
"simple_query_string": {
"query": "procter&gamble",
"default_operator": "or",
"fields": [
"description^2",
"contents^80"
]
}
},
"highlight": {
"fields": {
"description": {},
"contents": {}
}
}
}
# "query": "at&t",
GET english_example/_search
{
"query": {
"simple_query_string": {
"query": "at&t",
"default_operator": "or",
"fields": [
"description^2",
"contents^80"
]
}
},
"highlight": {
"fields": {
"description": {},
"contents": {}
}
}
}
In my snippet I redefine the default analyzer using word_delimiter_graph and whitespace tokenizer to search AT&T matches as well.
One option I can think of is to use a should query with a "standard analyzer" and your custom analyzer.
For "proctor & gamble" tokens generated using custom and standard analyzer will be "proctor","gamble"
For "proctor&gamble" tokens generated using custom analyzer will be "proctor","gamble","proctor&gamble" and using standard analyzer will "proctor" and "gamble"
So in should clause we can use a standard analyzer to look for "proctor" or "gamble" and a custom analyzer to look for "proctor&gamble"
GET english_example/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"description": {
"query": "Procter&Gamble",
"analyzer": "standard"
}
}
},
{
"match": {
"description": {
"query": "Procter&Gamble"
}
}
}
],
"minimum_should_match": 1
}
}
}
Second option will be to use synonymns where you define all variations in which proctor and gamble can appear to mean a single thing
I just realized that you are searching a description field and not a company field. So keyword analyzer wont work. I have updated my answer accordingly.
You can potentially try adding a custom field with lowercase and whitespace analyzer and use the same custom analyzer for search as well. When you perform search, search in both standard field and this custom field as a multimatch search. That should allow you to support both. You can boost the score for custom field so that exact matches comes in the top of the search results.
Trick is to convert user input to lower case before performing the search. You shouldn't use user input as is. Else this approach wont work.
You can use below scripts to try it out.
DELETE /test1
PUT /test1
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_analyzer" : {
"filter" : ["lowercase"],
"type" : "custom",
"tokenizer" : "whitespace"
}
}
}
},
"mappings": {
"properties": {
"description" : {
"type": "text",
"analyzer": "standard",
"fields": {
"custom" : {
"type" : "text",
"analyzer" : "lowercase_analyzer",
"search_analyzer" : "lowercase_analyzer"
}
}
}
}
}
}
PUT /test1/_bulk
{ "index" : { "_id" : "1" } }
{ "description" : "wi-fi AT&T BB&T Procter & Gamble, some\nOther $500 games with Peter's" }
{ "index" : { "_id" : "2" } }
{ "description" : "Procter & Gamble" }
{ "index" : { "_id" : "3" } }
{ "description" : "Procter&Gamble" }
GET test1/_search
{
"query": {
"multi_match": {
"query": "procter&gamble",
"fields": ["description", "description.custom"]
}
},
"highlight": {
"fields": {
"description": {},
"description.custom": {}
}
}
}
GET test1/_search
{
"query": {
"multi_match": {
"query": "procter",
"fields": ["description", "description.custom"]
}
},
"highlight": {
"fields": {
"description": {},
"description.custom": {}
}
}
}
GET test1/_search
{
"query": {
"multi_match": {
"query": "at&t",
"fields": ["description", "description.custom"]
}
},
"highlight": {
"fields": {
"description": {},
"description.custom": {}
}
}
}
GET test1/_search
{
"query": {
"multi_match": {
"query": "procter & gamble",
"fields": ["description", "description.custom"]
}
},
"highlight": {
"fields": {
"description": {},
"description.custom": {}
}
}
}
You can add highlighting and try it out.

Querying nested fields with analyzer results in error

I tried to use a synonym analyzer for my already working elastic search type. Here's the mapping of my serviceEntity:
{
"serviceentity" : {
"properties":{
"ServiceLangProps" : {
"type" : "nested",
"properties" : {
"NAME" : {"type" : "string", "search_analyzer": "synonym"},
"LONG_TEXT" : {"type" : "string", "search_analyzer": "synonym"},
"DESCRIPTION" : {"type" : "string", "search_analyzer": "synonym"},
"MATERIAL" : {"type" : "string", "search_analyzer": "synonym"},
"LANGUAGE_ID" : {"type" : "string", "include_in_all": false}
}
},
"LinkProps" : {
"type" : "nested",
"properties" : {
"TITLE" : {"type" : "string", "search_analyzer": "synonym"},
"LINK" : {"type" : "string"},
"LANGUAGE_ID" : {"type" : "string", "include_in_all": false}
}
},
"MediaProps" : {
"type" : "nested",
"properties" : {
"TITLE" : {"type" : "string", "search_analyzer": "synonym"},
"FILENAME" : {"type" : "string"},
"LANGUAGE_ID" : {"type" : "string", "include_in_all": false}
}
}
}
}
}
And these are my setting
{
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms": [
"lorep, spaceship",
"ipsum, planet"
]
}
},
"analyzer": {
"synonym": {
"filter": [
"lowercase",
"synonym"
],
"tokenizer": "whitespace"
}
}
}
}
When In try to search for anything, I get this Error:
Caused by: org.elasticsearch.index.query.QueryParsingException: [nested] nested object under path [ServiceLangProps] is not of nested type
And I don't understand why. If I don't add any analyzer to my setting, everything works fine.
I'm using the java API to communicate with the elasticsearch instance. Therefore my code looks something like this for the multi match query:
MultiMatchQueryBuilder multiMatchBuilder = QueryBuilders.multiMatchQuery(fulltextSearchString, QUERY_FIELDS).analyzer("synonym");
The query string created by the java API looks like this:
{
"query" : {
"bool" : {
"must" : {
"bool" : {
"should" : [ {
"nested" : {
"query" : {
"bool" : {
"must" : [ {
"match" : {
"ServiceLangProps.LANGUAGE_ID" : {
"query" : "DE",
"type" : "boolean"
}
}
}, {
"multi_match" : {
"query" : "lorem",
"fields" : [ "ServiceLangProps.NAME", "ServiceLangProps.DESCRIPTION", "ServiceLangProps.MATERIALKURZTEXT", "ServiceLangProps.DESCRIPTION_RICHTEXT" ],
"analyzer" : "synonym"
}
} ]
}
},
"path" : "ServiceLangProps"
}
}, {
"nested" : {
"query" : {
"bool" : {
"must" : [ {
"match" : {
"LinkProps.LANGUAGE_ID" : {
"query" : "DE",
"type" : "boolean"
}
}
}, {
"match" : {
"LinkProps.TITLE" : {
"query" : "lorem",
"type" : "boolean"
}
}
} ]
}
},
"path" : "LinkProps"
}
}, {
"nested" : {
"query" : {
"bool" : {
"must" : [ {
"match" : {
"MediaProps.LANGUAGE_ID" : {
"query" : "DE",
"type" : "boolean"
}
}
}, {
"match" : {
"MediaProps.TITLE" : {
"query" : "lorem",
"type" : "boolean"
}
}
} ]
}
},
"path" : "MediaProps"
}
} ]
}
},
"filter" : {
"bool" : { }
}
}
}
}
If I try it on the LinkProps or MediaProps, I get the same error for the respective nested object.
Edit: I'm using version 2.4.6 of elasticsearch
Would be helpful to check the query string as well and knowing what version of ES is being used.
I couldnt see the synonyms_path as well as the fact you are using nested types can cause that error.
You probably have seen this already but in case you havent
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-synonym-tokenfilter.html
I created a minimal example of what I'm trying to do.
My mapping looks like this:
{
"serviceentity" : {
"properties":{
"LinkProps" : {
"type" : "nested",
"properties" : {
"TITLE" : {"type" : "string", "search_analyzer": "synonym"},
"LINK" : {"type" : "string"},
"LANGUAGE_ID" : {"type" : "string", "include_in_all": false}
}
}
}
}
}
And my settings for the synonym analyzer in JAVA code:
XContentBuilder builder = jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("filter")
.startObject("synonym") // The name of the analyzer
.field("type", "synonym") // The type (derivate)
.field("ignore_case", "true")
.array("synonyms", synonyms) // The synonym list
.endObject()
.endObject()
.startObject("analyzer")
.startObject("synonym")
.field("tokenizer", "whitespace")
.array("filter", "lowercase", "synonym")
.endObject()
.endObject()
.endObject()
.endObject();
The metadata which the ElasticSearch Head Chrome plugin spits out looks like this:
{
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms": [
"Test, foo",
"Title, bar"
]
}
},
"analyzer": {
"synonym": {
"filter": [
"lowercase",
"synonym"
],
"tokenizer": "whitespace"
}
}
}
}
When I now use a search query to look for "Test" I get the same error as mentioned in my first post. Here's the query
{
"query": {
"bool": {
"must": {
"nested": {
"path": "LinkProps",
"query": {
"multi_match": {
"query": "Test",
"fields": [
"LinkProps.TITLE",
"LinkProps.LINK"
],
"analyzer": "synonym"
}
}
}
}
}
}
}
which leads to this error
{
"error": {
"root_cause": [
{
"type": "query_parsing_exception",
"reason": "[nested] nested object under path [LinkProps] is not of nested type",
"index": "minimal",
"line": 1,
"col": 44
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "minimal",
"node": "6AhE4RCIQwywl49h0Q2-yw",
"reason": {
"type": "query_parsing_exception",
"reason": "[nested] nested object under path [LinkProps] is not of nested type",
"index": "minimal",
"line": 1,
"col": 44
}
}
]
},
"status": 400
}
When I check the analyzer with
GET http://localhost:9200/minimal/_analyze?text=foo&analyzer=synonym&pretty=true
I get the correct answer
{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "test",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
}
]
}
So the analyzer seems to set up correctly. Did I messed up the mappings? I guess the problem is not because I have nested objects or is it?
I just tried this
{
"query": {
"bool": {
"must": {
"query": {
"multi_match": {
"query": "foo",
"fields": [
"LinkProps.TITLE",
"LinkProps.LINK"
],
"analyzer": "synonym"
}
}
}
}
}
}
As you can see, I removed the "nested" wrapper
"nested": {
"path": "LinkProps",
...
}
which now leads at least in some results (Not sure yet, if these will finally be the correct results). I'm trying to apply this to the original project and keep you posted if this also worked.

Elasticsearch terms aggregate duplicates

I have a field using a ngram analyzer and trying to use a terms aggregate on the field to return unique documents by the field. The returned keys in the aggregates don't match the documents fields being returned and I'm getting duplicate fields.
"analysis" : {
"filter" : {
"autocomplete_filter" : {
"type" : "edge_ngram",
"min_gram" : "1",
"max_gram" : "20"
}
},
"analyzer" : {
"autocomplete" : {
"type" : "custom",
"filter" : [ "lowercase", "autocomplete_filter" ],
"tokenizer" : "standard"
}
}
}
}
"name" : {
"type" : "string",
"analyzer" : "autocomplete",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
{
"query": {
"query_string": {
"query":"bra",
"fields":["name"],
"use_dis_max":true
}
},
"aggs": {
"group_by_name": {
"terms": { "field":"name.raw" }
}
}
}
I'm getting back the following names and keys.
Braingeyser, Brainstorm, Braingeyser, Brainstorm, Brainstorm, Brainstorm, Bramblecrush, Brainwash, Brainwash, Braingeyser
{"key":"Bog Wraith","doc_count":18}
{"key":"Birds of Paradise","doc_count":15}
{"key":"Circle of Protection: Black","doc_count":15}
{"key":"Lightning Bolt","doc_count":15}
{"key":"Grizzly Bears","doc_count":14}
{"key":"Black Knight","doc_count":13}
{"key":"Bad Moon","doc_count":12}
{"key":"Boomerang","doc_count":12}
{"key":"Wall of Bone","doc_count":12}
{"key":"Balance","doc_count":11}
How can I get elasticsearch to only return unique fields from the aggregate?
To remove duplicates being returned in your aggregate you can try:
"aggs": {
"group_by_name": {
"terms": { "field":"name.raw" },
"aggs": {
"remove_dups": {
"top_hits": {
"size": 1,
"_source": false
}
}
}
}
}

Resources