How I can implement synonyms in elastic search? - elasticsearch

I want to implement synonyms in my mapping. I have created parent-child mapping. Here is my mapping.
{
"mapping":{
"mappings":{
"question_data":{
"properties":{
"question_id":{
"type":"integer"
},
"question":{
"type":"string"
}
}
},
"answer_data":{
"_parent":{
"type":"question_data"
},
"_routing":{
"required":true
},
"properties":{
"answer_id":{
"type":"integer"
},
"answer":{
"type":"string",
}
}
}
}
}
}
Thanks in advance.

To use synonyms in elasticsearch you have to first create a synonym analyzer in settings to add synonym support for a particular field. Also in the settings you can define synonyms also.
PUT testindex_510
{
"settings": {
"analysis": {
"analyzer": {
"synonymanalyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "locationsynfilter"]
},
"synonymanalyzer1": {
"tokenizer": "standard",
"filter": ["lowercase", "titlesynfilter"]
}
},
"filter": {
"locationsynfilter": {
"type": "synonym",
"synonyms": [
"lokhandwala,andheri west",
"versova,andheri west",
"mazgaon,byculla"
]
},
"titlesynfilter": {
"type": "synonym",
"synonyms": [
"golds , gold",
"talwalkars, talwalkar"
]
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"title": {
"type": "string",
"analyzer": "synonymanalyzer1"
},
"location": {
"type": "string",
"analyzer": "synonymanalyzer"
}
}
}
}
}
In the above settings i defined two analyzer for two different fields. These analyzers support synonms and also define the synonyms in the filter for each analyzer.
You also define synonyms in a seperate txt file instead of defining in the mappings like the following one.
{
"settings": {
"analysis": {
"analyzer": {
"synonymanalyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "locationsynfilter"]
},
"synonymanalyzer1": {
"tokenizer": "standard",
"filter": ["lowercase", "titlesynfilter"]
}
},
"filter": {
"titlesynfilter": {
"type": "synonym",
"synonyms_path": "analysis/titlesynonym.txt"
},
"locationsynfilter": {
"type": "synonym",
"synonyms_path": "analysis/locationsynonym.txt"
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"title": {
"type": "string",
"analyzer": "synonymanalyzer1"
},
"location": {
"type": "string",
"analyzer": "synonymanalyzer"
}
}
}
}
}
where your txt file should look like. Please refer documentation for more configuration.
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
Hope this helps

Related

How to apply multiple settings in index in elasticsearch

I need to have two settings
One is stopwords settings, second is synonym settings.
How to add different settings applied on one index
Below is stopwords setting which i need to apply on the index
settings_1 = {
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": "my_fil"
}
},
"filter": {
"my_fil": {
"type": "stop",
"stopwords_path": "st.txt",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_stop_analyzer"
}
}
}
}
Below is synonym setting which i need to apply on the index
settings_2 = {
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_filter"
]
}
},
"filter": {
"my_filter": {
"type": "synonym",
"synonyms_path": "sy.txt",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_analyzer"
}
}
}
}
Will the code work like below
es.indices.put_settings(index="gene", body=settings_1)
es.indices.put_settings(index="gene", body=settings_2)
Although you can use the two different update setting like you mentioned but this is not a preferred way 1) it involves two network call to Elasticsearch 2) this can be combined in a single call and it will have less overhead at Elasticsearch to update the cluster state to all the nodes.
You can just combine both the settings and send single update setting request. you can first test this in Postman or kibana dev tools with JSON format.
As discussed in the comment, below is the complete setting and mapping part combining two settings(which defines two analyzer)
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": "my_fil"
},
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_filter"
]
}
},
"filter": {
"my_fil": {
"type": "stop",
"stopwords_path": "analyzers/<your analyzer ID>",
"updateable": true
},
"my_filter": {
"type": "synonym",
"synonyms_path": "analyzers/F111111111",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_stop_analyzer"
}
}
}
}

Filter query words(multi-lang) which I don't want Elasticsearch use for search

I have this kind of query. When I pass query argument like TOO Big House I don't want Elastic to search by word TOO. Because there are a lot of this kind names with TOO in begining. There is nothing about it in documentation. Is it posible in ElasticSearch?
{"bool": {
"must": [
{
"match": {
"consignorOrganizationName": {
"query":"?0"
}
}
}
]
}}
Field from index:
"properties": {
"consignorOrganizationName": {
"type": "text"
}
}
After I figured out that the problem can be because of multi-lang stopwords. I tried this and looks like this works for me. But I'm not sure if this approach is good
"analyzer": {
"company_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"russian_stop",
"english_stop"
]
}
},
"filter": {
"russian_stop": {
"type": "stop",
"ignore_case": true,
"stopwords": ["ТОО"]
},
"english_stop": {
"type": "stop",
"ignore_case": true,
"stopwords": ["TOO"]
}
}
If you just want to rely on text analysis, you can create a custom analyzer with a stop token filter in which you specify your custom stopword TOO (see docs).
PUT your-index
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"tokenizer": "whitespace",
"filter": [ "my_custom_stop_words_filter" ]
}
},
"filter": {
"my_custom_stop_words_filter": {
"type": "stop",
"ignore_case": true,
"stopwords": [ "TOO" ]
}
}
}
},
"mappings": {
"properties": {
"consignorOrganizationName": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}

Elasticsearch exclude "stop" words from highlight

I want to exclude the default stop words from being highlighted but I'm not sure why this isn't working
ES config
"settings": {
"analysis": {
"analyzer": {
"search_synonyms": {
"tokenizer": "whitespace",
"filter": [
"graph_synonyms",
"lowercase",
"asciifolding",
"stop"
],
}
},
"filter": {
"graph_synonyms": {
...
}
},
"normalizer": {
"normalizer_1": {
...
}
}
}
},
Fields mapping:
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "search_synonyms"
},
"narrative": {
"type":"object",
"properties":{
"_all":{
"type": "text",
"analyzer": "search_synonyms"
}
}
},
"originator": {
"type": "keyword",
"normalizer": "normalizer_1"
},
................
}
}
Highlight query:
highlight : {
fields:{
"*":{}
}
},
Currently I'm getting stop words such as this, A, IS highlighted within narrative fields and I want to prevent that.

Implementing multiple synonym_path for single index in elastic search

I am trying to achieve multiple synonym_path for a single index in elasticsearch.
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter": {
"bool": {
"should": [{
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
}},
{
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms2.txt",
"ignore_case": true
}}]
}
}
}
}
},
"mappings": {
"animals": {
"properties": {
"name": {
"type": "String",
"analyzer": "synonym"
}
}
}
}
I tried the snippet above using JSON Sense in Chrome but it generated a TokenFilter [bool] must have a type associated with it error.
Is there other way to implement it?
The filter section in the analysis section is not meant to contain the Query DSL but token filter definitions.
In your case, you need to re-create your index with the following settings:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonyms": {
"tokenizer": "whitespace",
"filter": [
"synonym1",
"synonym2"
]
}
},
"filter": {
"synonym1": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
},
"synonym2": {
"type": "synonym",
"synonyms_path": "synonyms2.txt",
"ignore_case": true
}
}
}
}
},
"mappings": {
"animals": {
"properties": {
"name": {
"type": "string",
"analyzer": "synonyms"
}
}
}
}
}

In ES how to write mappings so that to use wildcard query for both lowercase as well as uppercase?

Hello all i am facing two problems in ES
I have a 'city' 'New York' in ES now i want to write a term filter such that if given string exactly matches "New York" then only it returns but what is happening is that when my filter matches "New" OR "York" for both it returns "New York" but it is not returning anything for "New York" my mapping is given below please tell me which analyzer or tokenizer should i use inside mapping
Here are the settings and mapping:
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
}
}
}
},
mappings : {
"restaurant" : {
properties:{
address : {
properties:{
city : {"type" : "string", "analyzer": "synonym"},
}
}
}
}
Second problem is that when i am trying to use wildcard query on lowercase example "new*" then ES is not returning not anything but when i am trying to search uppercase example "New*" now it is returning "New York" now i in this second case i want to write my city mappings such that when i search for lowercase or uppercase for both ES returns the same thing i have seen ignore case and i have set it to false inside synonyms but still i am not able to search for both lowercase and uppercases.
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true // See here
}
I believe you didn't provide enough details, but hoping that my attempt will generate questions from you, I will post what I believe it should be a step forward:
The mapping:
PUT test
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
},
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
}
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"address": {
"properties": {
"city": {
"type": "string",
"analyzer": "synonym",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
},
"raw_ignore_case": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
}
}
Test data:
POST /test/restaurant/1
{
"address": {"city":"New York"}
}
POST /test/restaurant/2
{
"address": {"city":"new york"}
}
Query for the first problem:
GET /test/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"address.city.raw": "New York"
}
}
}
}
}
Query for the second problem:
GET /test/restaurant/_search
{
"query": {
"query_string": {
"query": "address.city.raw_ignore_case:new*"
}
}
}

Resources