Implementing multiple synonym_path for single index in elastic search - elasticsearch

I am trying to achieve multiple synonym_path for a single index in elasticsearch.
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter": {
"bool": {
"should": [{
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
}},
{
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms2.txt",
"ignore_case": true
}}]
}
}
}
}
},
"mappings": {
"animals": {
"properties": {
"name": {
"type": "String",
"analyzer": "synonym"
}
}
}
}
I tried the snippet above using JSON Sense in Chrome but it generated a TokenFilter [bool] must have a type associated with it error.
Is there other way to implement it?

The filter section in the analysis section is not meant to contain the Query DSL but token filter definitions.
In your case, you need to re-create your index with the following settings:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonyms": {
"tokenizer": "whitespace",
"filter": [
"synonym1",
"synonym2"
]
}
},
"filter": {
"synonym1": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
},
"synonym2": {
"type": "synonym",
"synonyms_path": "synonyms2.txt",
"ignore_case": true
}
}
}
}
},
"mappings": {
"animals": {
"properties": {
"name": {
"type": "string",
"analyzer": "synonyms"
}
}
}
}
}

Related

How to apply multiple settings in index in elasticsearch

I need to have two settings
One is stopwords settings, second is synonym settings.
How to add different settings applied on one index
Below is stopwords setting which i need to apply on the index
settings_1 = {
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": "my_fil"
}
},
"filter": {
"my_fil": {
"type": "stop",
"stopwords_path": "st.txt",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_stop_analyzer"
}
}
}
}
Below is synonym setting which i need to apply on the index
settings_2 = {
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_filter"
]
}
},
"filter": {
"my_filter": {
"type": "synonym",
"synonyms_path": "sy.txt",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_analyzer"
}
}
}
}
Will the code work like below
es.indices.put_settings(index="gene", body=settings_1)
es.indices.put_settings(index="gene", body=settings_2)
Although you can use the two different update setting like you mentioned but this is not a preferred way 1) it involves two network call to Elasticsearch 2) this can be combined in a single call and it will have less overhead at Elasticsearch to update the cluster state to all the nodes.
You can just combine both the settings and send single update setting request. you can first test this in Postman or kibana dev tools with JSON format.
As discussed in the comment, below is the complete setting and mapping part combining two settings(which defines two analyzer)
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": "my_fil"
},
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_filter"
]
}
},
"filter": {
"my_fil": {
"type": "stop",
"stopwords_path": "analyzers/<your analyzer ID>",
"updateable": true
},
"my_filter": {
"type": "synonym",
"synonyms_path": "analyzers/F111111111",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_stop_analyzer"
}
}
}
}

Elastic search multiple analyzers on index

I have an index with Name field .
I want to use soundex analyzer and synonym analyzer on that field.
I want to achieve both in a single index .Is it even possible ?
Please help me experts out there
Index 1
{
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "phonetic_sample",
"creation_date": "1603097131476",
"analysis": {
"filter": {
"my_soundex": {
"replace": "false",
"type": "phonetic",
"encoder": "soundex"
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"my_soundex"
],
"tokenizer": "standard"
}
}
}
I query for Catherine and match Catherine,Katherine and Kathryn
Index 2
{
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "phonetic_synonym",
"creation_date": "1603121439096",
"analysis": {
"filter": {
"synonym": {
"format": "wordnet",
"type": "synonym",
"synonyms": [
"s(100000001,1,'Bill',v,1,0).",
"s(100000001,2,'William',v,1,0).",
"s(100000001,3,'Wilhelm',v,1,0)."
]
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "whitespace"
}
}
}
I query for Bill and match Bill, William and Wilhelm
You can use multi-field with multiple analyzers. You can declare
sub-fields for the name field, each with a different analyzer.
Below is the modified index mapping.
Index Mapping:
{
"settings": {
"index": {
"analysis": {
"filter": {
"my_soundex": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
},
"synonym": {
"format": "wordnet",
"type": "synonym",
"synonyms": [
"s(100000001,1,'Bill',v,1,0).",
"s(100000001,2,'William',v,1,0).",
"s(100000001,3,'Wilhelm',v,1,0)."
]
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "whitespace"
},
"my_analyzer": {
"filter": [
"lowercase",
"my_soundex"
],
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analzyer": "synonym",
"search_analyzer": "synonym",
"fields": {
"content": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
}
}
}
}
}
Then you can refer to name and name.content in your queries. Your search query will be like this:
{
"query": {
"multi_match": {
"query": "Bill",
"fields": [
"name",
"name.content"
],
"type": "most_fields"
}
}
}

how to add filter and mappings to elasticsearch schema while creating the index?

I want to use filters like synonyms and stopwords along with mapping types in elastic search schema while indexing. Below is the json I am using. But when i use the json below, I am able to get the mappings but the filters are lost. What could be the reason? (I am using elasticsearch 6.2)
nlp_settings = {
"settings": {
"index" : {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": ["synonym", "stop_words", "lowercase",
"stop_words_user", "synonym_user"]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"stop_words": {
"type": "stop",
"stopwords_path": "stopwords.txt"
},
"stop_words_user": {
"type": "stop",
"stopwords": "_none_"
},
"synonym_user": {
"type": "synonym",
"synonyms": default_synonym
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"section":{"type": "text"},
"document_name": {"type": "text"},
"dir_path_info": {"type": "text"},
"nlu_raw": {
"noun_list": {"type": "nested"},
"verb_list": {"type": "nested"},
},
"nlu": {
"noun": {"type": "nested"},
"verb": {"type": "nested"}
}
}
}
}
}
When I use the mappings along with the filters, I get the following JSON when I GET from this url http://localhost:9233/test/_settings
{
"test": {
"settings": {
"index": {
"creation_date": "1523962921677",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "FevdHGZjQm6ke2FgeNdnMQ",
"version": {
"created": "6020199"
},
"provided_name": "test"
}
}
}
}
However, what i actually want is
{
"test": {
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "test",
"creation_date": "1523963029203",
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"synonym_user": {
"type": "synonym",
"synonyms": [
"a, a"
]
},
"stop_words_user": {
"type": "stop",
"stopwords": [
"please",
"help"
]
},
"stop_words": {
"type": "stop",
"stopwords_path": "stopwords.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym",
"stop_words",
"lowercase",
"stop_words_user",
"synonym_user"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "CiBBgngdR_aNHkY1m0EtXw",
"version": {
"created": "6020199"
}
}
}
}
}
I get this, when I remove the mappings from the schema.
settings and mappings should be on the same level. So:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": [
"synonym",
"stop_words",
"lowercase",
"stop_words_user",
"synonym_user"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"stop_words": {
"type": "stop",
"stopwords_path": "stopwords.txt"
},
"stop_words_user": {
"type": "stop",
"stopwords": "_none_"
},
"synonym_user": {
"type": "synonym",
"synonyms": "default_synonym"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"section": {
"type": "text"
},
"document_name": {
"type": "text"
},
"dir_path_info": {
"type": "text"
},
"nlu_raw": {
"properties": {
"noun_list": {
"type": "nested"
},
"verb_list": {
"type": "nested"
}
}
},
"nlu": {
"properties": {
"noun": {
"type": "nested"
},
"verb": {
"type": "nested"
}
}
}
}
}
}
}

How I can implement synonyms in elastic search?

I want to implement synonyms in my mapping. I have created parent-child mapping. Here is my mapping.
{
"mapping":{
"mappings":{
"question_data":{
"properties":{
"question_id":{
"type":"integer"
},
"question":{
"type":"string"
}
}
},
"answer_data":{
"_parent":{
"type":"question_data"
},
"_routing":{
"required":true
},
"properties":{
"answer_id":{
"type":"integer"
},
"answer":{
"type":"string",
}
}
}
}
}
}
Thanks in advance.
To use synonyms in elasticsearch you have to first create a synonym analyzer in settings to add synonym support for a particular field. Also in the settings you can define synonyms also.
PUT testindex_510
{
"settings": {
"analysis": {
"analyzer": {
"synonymanalyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "locationsynfilter"]
},
"synonymanalyzer1": {
"tokenizer": "standard",
"filter": ["lowercase", "titlesynfilter"]
}
},
"filter": {
"locationsynfilter": {
"type": "synonym",
"synonyms": [
"lokhandwala,andheri west",
"versova,andheri west",
"mazgaon,byculla"
]
},
"titlesynfilter": {
"type": "synonym",
"synonyms": [
"golds , gold",
"talwalkars, talwalkar"
]
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"title": {
"type": "string",
"analyzer": "synonymanalyzer1"
},
"location": {
"type": "string",
"analyzer": "synonymanalyzer"
}
}
}
}
}
In the above settings i defined two analyzer for two different fields. These analyzers support synonms and also define the synonyms in the filter for each analyzer.
You also define synonyms in a seperate txt file instead of defining in the mappings like the following one.
{
"settings": {
"analysis": {
"analyzer": {
"synonymanalyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "locationsynfilter"]
},
"synonymanalyzer1": {
"tokenizer": "standard",
"filter": ["lowercase", "titlesynfilter"]
}
},
"filter": {
"titlesynfilter": {
"type": "synonym",
"synonyms_path": "analysis/titlesynonym.txt"
},
"locationsynfilter": {
"type": "synonym",
"synonyms_path": "analysis/locationsynonym.txt"
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"title": {
"type": "string",
"analyzer": "synonymanalyzer1"
},
"location": {
"type": "string",
"analyzer": "synonymanalyzer"
}
}
}
}
}
where your txt file should look like. Please refer documentation for more configuration.
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
Hope this helps

In ES how to write mappings so that to use wildcard query for both lowercase as well as uppercase?

Hello all i am facing two problems in ES
I have a 'city' 'New York' in ES now i want to write a term filter such that if given string exactly matches "New York" then only it returns but what is happening is that when my filter matches "New" OR "York" for both it returns "New York" but it is not returning anything for "New York" my mapping is given below please tell me which analyzer or tokenizer should i use inside mapping
Here are the settings and mapping:
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
}
}
}
},
mappings : {
"restaurant" : {
properties:{
address : {
properties:{
city : {"type" : "string", "analyzer": "synonym"},
}
}
}
}
Second problem is that when i am trying to use wildcard query on lowercase example "new*" then ES is not returning not anything but when i am trying to search uppercase example "New*" now it is returning "New York" now i in this second case i want to write my city mappings such that when i search for lowercase or uppercase for both ES returns the same thing i have seen ignore case and i have set it to false inside synonyms but still i am not able to search for both lowercase and uppercases.
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true // See here
}
I believe you didn't provide enough details, but hoping that my attempt will generate questions from you, I will post what I believe it should be a step forward:
The mapping:
PUT test
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
},
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
}
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"address": {
"properties": {
"city": {
"type": "string",
"analyzer": "synonym",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
},
"raw_ignore_case": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
}
}
Test data:
POST /test/restaurant/1
{
"address": {"city":"New York"}
}
POST /test/restaurant/2
{
"address": {"city":"new york"}
}
Query for the first problem:
GET /test/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"address.city.raw": "New York"
}
}
}
}
}
Query for the second problem:
GET /test/restaurant/_search
{
"query": {
"query_string": {
"query": "address.city.raw_ignore_case:new*"
}
}
}

Resources