I need to have two settings
One is stopwords settings, second is synonym settings.
How to add different settings applied on one index
Below is stopwords setting which i need to apply on the index
settings_1 = {
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": "my_fil"
}
},
"filter": {
"my_fil": {
"type": "stop",
"stopwords_path": "st.txt",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_stop_analyzer"
}
}
}
}
Below is synonym setting which i need to apply on the index
settings_2 = {
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_filter"
]
}
},
"filter": {
"my_filter": {
"type": "synonym",
"synonyms_path": "sy.txt",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_analyzer"
}
}
}
}
Will the code work like below
es.indices.put_settings(index="gene", body=settings_1)
es.indices.put_settings(index="gene", body=settings_2)
Although you can use the two different update setting like you mentioned but this is not a preferred way 1) it involves two network call to Elasticsearch 2) this can be combined in a single call and it will have less overhead at Elasticsearch to update the cluster state to all the nodes.
You can just combine both the settings and send single update setting request. you can first test this in Postman or kibana dev tools with JSON format.
As discussed in the comment, below is the complete setting and mapping part combining two settings(which defines two analyzer)
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": "my_fil"
},
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_filter"
]
}
},
"filter": {
"my_fil": {
"type": "stop",
"stopwords_path": "analyzers/<your analyzer ID>",
"updateable": true
},
"my_filter": {
"type": "synonym",
"synonyms_path": "analyzers/F111111111",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_stop_analyzer"
}
}
}
}
Related
I have an index with Name field .
I want to use soundex analyzer and synonym analyzer on that field.
I want to achieve both in a single index .Is it even possible ?
Please help me experts out there
Index 1
{
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "phonetic_sample",
"creation_date": "1603097131476",
"analysis": {
"filter": {
"my_soundex": {
"replace": "false",
"type": "phonetic",
"encoder": "soundex"
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"my_soundex"
],
"tokenizer": "standard"
}
}
}
I query for Catherine and match Catherine,Katherine and Kathryn
Index 2
{
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "phonetic_synonym",
"creation_date": "1603121439096",
"analysis": {
"filter": {
"synonym": {
"format": "wordnet",
"type": "synonym",
"synonyms": [
"s(100000001,1,'Bill',v,1,0).",
"s(100000001,2,'William',v,1,0).",
"s(100000001,3,'Wilhelm',v,1,0)."
]
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "whitespace"
}
}
}
I query for Bill and match Bill, William and Wilhelm
You can use multi-field with multiple analyzers. You can declare
sub-fields for the name field, each with a different analyzer.
Below is the modified index mapping.
Index Mapping:
{
"settings": {
"index": {
"analysis": {
"filter": {
"my_soundex": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
},
"synonym": {
"format": "wordnet",
"type": "synonym",
"synonyms": [
"s(100000001,1,'Bill',v,1,0).",
"s(100000001,2,'William',v,1,0).",
"s(100000001,3,'Wilhelm',v,1,0)."
]
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "whitespace"
},
"my_analyzer": {
"filter": [
"lowercase",
"my_soundex"
],
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analzyer": "synonym",
"search_analyzer": "synonym",
"fields": {
"content": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
}
}
}
}
}
Then you can refer to name and name.content in your queries. Your search query will be like this:
{
"query": {
"multi_match": {
"query": "Bill",
"fields": [
"name",
"name.content"
],
"type": "most_fields"
}
}
}
I want to exclude the default stop words from being highlighted but I'm not sure why this isn't working
ES config
"settings": {
"analysis": {
"analyzer": {
"search_synonyms": {
"tokenizer": "whitespace",
"filter": [
"graph_synonyms",
"lowercase",
"asciifolding",
"stop"
],
}
},
"filter": {
"graph_synonyms": {
...
}
},
"normalizer": {
"normalizer_1": {
...
}
}
}
},
Fields mapping:
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "search_synonyms"
},
"narrative": {
"type":"object",
"properties":{
"_all":{
"type": "text",
"analyzer": "search_synonyms"
}
}
},
"originator": {
"type": "keyword",
"normalizer": "normalizer_1"
},
................
}
}
Highlight query:
highlight : {
fields:{
"*":{}
}
},
Currently I'm getting stop words such as this, A, IS highlighted within narrative fields and I want to prevent that.
I'm building blog-like app with flask (based on Miguel Grinberg Megatutorial) and I'm trying to setup ES indexing that would support autocomplete feature. I'm struggling with setting up indexing correctly.
I started with (working) simple indexing mechanism:
from flask import current_app
def add_to_index(index, model):
if not current_app.elasticsearch:
return
payload = {}
for field in model.__searchable__:
payload[field] = getattr(model, field)
current_app.elasticsearch.index(index=index, id=model.id, body=payload)
and after some fun with Google I found out that my body could look something like that (probably with fewer analyzers, but I'm coping exactly as I found it somewhere, where author claims it works):
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
field: {
"properties": {
"name": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}
}
}
}
}
I figured out that I can modify original mechanism to something like:
for field in model.__searchable__:
temp = getattr(model, field)
fields[field] = {"properties": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}}
payload = {
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": fields
}
but that's where I'm lost. Where do I put actual content (temp=getattr(model, field)) in this document so that whole thing works? I couldn't find any example or relevant part of documentation that would cover updating index with slightly more complex mappings and so on, is this even correct/doable? Every guide I see covers bulk indexing and somehow I fail to make connection.
I think you are a little bite confuse let me try to explain. What you want is adding one document in elastic with:
current_app.elasticsearch.index(index=index, id=model.id,
body=payload)
Which is using the index() method defined in the elasticsearch-py lib
Check the example here:
https://elasticsearch-py.readthedocs.io/en/master/index.html#example-usage
body must be your document a simple dict, as shown in the example from the doc.
What you set is the settings of the index which is different. Take the analogy of the database, you set the schema of a table inside the document.
To set the settings if you want to set the given settings you need to use put_settings, as defined here:
https://elasticsearch-py.readthedocs.io/en/master/api.html?highlight=settings#elasticsearch.client.ClusterClient.put_settings
I hope it help you.
I want to implement synonyms in my mapping. I have created parent-child mapping. Here is my mapping.
{
"mapping":{
"mappings":{
"question_data":{
"properties":{
"question_id":{
"type":"integer"
},
"question":{
"type":"string"
}
}
},
"answer_data":{
"_parent":{
"type":"question_data"
},
"_routing":{
"required":true
},
"properties":{
"answer_id":{
"type":"integer"
},
"answer":{
"type":"string",
}
}
}
}
}
}
Thanks in advance.
To use synonyms in elasticsearch you have to first create a synonym analyzer in settings to add synonym support for a particular field. Also in the settings you can define synonyms also.
PUT testindex_510
{
"settings": {
"analysis": {
"analyzer": {
"synonymanalyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "locationsynfilter"]
},
"synonymanalyzer1": {
"tokenizer": "standard",
"filter": ["lowercase", "titlesynfilter"]
}
},
"filter": {
"locationsynfilter": {
"type": "synonym",
"synonyms": [
"lokhandwala,andheri west",
"versova,andheri west",
"mazgaon,byculla"
]
},
"titlesynfilter": {
"type": "synonym",
"synonyms": [
"golds , gold",
"talwalkars, talwalkar"
]
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"title": {
"type": "string",
"analyzer": "synonymanalyzer1"
},
"location": {
"type": "string",
"analyzer": "synonymanalyzer"
}
}
}
}
}
In the above settings i defined two analyzer for two different fields. These analyzers support synonms and also define the synonyms in the filter for each analyzer.
You also define synonyms in a seperate txt file instead of defining in the mappings like the following one.
{
"settings": {
"analysis": {
"analyzer": {
"synonymanalyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "locationsynfilter"]
},
"synonymanalyzer1": {
"tokenizer": "standard",
"filter": ["lowercase", "titlesynfilter"]
}
},
"filter": {
"titlesynfilter": {
"type": "synonym",
"synonyms_path": "analysis/titlesynonym.txt"
},
"locationsynfilter": {
"type": "synonym",
"synonyms_path": "analysis/locationsynonym.txt"
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"title": {
"type": "string",
"analyzer": "synonymanalyzer1"
},
"location": {
"type": "string",
"analyzer": "synonymanalyzer"
}
}
}
}
}
where your txt file should look like. Please refer documentation for more configuration.
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
Hope this helps
I am trying to achieve multiple synonym_path for a single index in elasticsearch.
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter": {
"bool": {
"should": [{
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
}},
{
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms2.txt",
"ignore_case": true
}}]
}
}
}
}
},
"mappings": {
"animals": {
"properties": {
"name": {
"type": "String",
"analyzer": "synonym"
}
}
}
}
I tried the snippet above using JSON Sense in Chrome but it generated a TokenFilter [bool] must have a type associated with it error.
Is there other way to implement it?
The filter section in the analysis section is not meant to contain the Query DSL but token filter definitions.
In your case, you need to re-create your index with the following settings:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonyms": {
"tokenizer": "whitespace",
"filter": [
"synonym1",
"synonym2"
]
}
},
"filter": {
"synonym1": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
},
"synonym2": {
"type": "synonym",
"synonyms_path": "synonyms2.txt",
"ignore_case": true
}
}
}
}
},
"mappings": {
"animals": {
"properties": {
"name": {
"type": "string",
"analyzer": "synonyms"
}
}
}
}
}