Elasticsearch Synonym search analyzer not updating after update synonyms.txt? - elasticsearch

So I have an index with the synonym mapping defined in the search analyzer. When I first created the index, the synonyms were picked up on search. After that, I updated the synonyms.txt files on the nodes once to update a synonym mapping and restarted each node after making a change. This caused the synonym change to be reflected on search thoughout the index.
Now, when I change the synonyms file and restart the nodes, the synonym mapping isn't updating as I believe it should. Am I missing something? I thought since the synonym mapping was on a search_analyzer I wouldn't have to reindex each time to reflect the changes.
Here is my index definition:
PUT /synonym_index
{
"aliases": {},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english",
"search_analyzer":"english_and_synonyms"
}
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
},
"english_and_synonyms": {
"tokenizer": "standard",
"filter": [
"search_synonyms",
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": ["example"]
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"search_synonyms" : {
"type" : "synonym_graph",
"synonyms_path" : "analysis/synonyms.txt"
}
}
},
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
}
}
I've tried restarting the node with
sudo service elasticsearch restart
and also with
sudo service elasticsearch stop
sudo service elasticsearch start
but neither are causing my changes to reflect. Do I need to reindex every time I update the synonyms file even though it's a search analyzer?

To reflect the change in the synonyms file, you need to close and open the index after making the changes to the file. This can be done by doing a post request:
POST /synonym_index/_close
POST /synonym_index/_open
After the _open call, you should see the changes reflected in your searches

Maybe the Reload Search Analyzers API is what you are looking for:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-reload-analyzers.html
You have to declare that your synonyms are updatable:
"search_synonyms" : {
"type" : "synonym_graph",
"synonyms_path" : "analysis/synonyms.txt",
"updatable": true
}
And in your mapping you need to declare your custom search_analyzer:
"mappings": {
"properties": {
"one_attribute": {
"type": "text",
"search_analyzer": "english_and_synonyms"
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html
Do I need to reindex every time I update the synonyms file even though it's a search analyzer?
Only, if your synonyms are being used during index time. If they are only used during search time you don't have to reindex every time.

Related

Custom stopword analyzer is not woring properly

I have created an index with a custom analyzer for stop words. I want that elastic-search to ignore these words at the time of searching. Then I added one document data in elasticsearch mapping.
but when I am querying in kibana for "the" keyword with the query. It should not show any successful match, because in my_analzer I have put "the" in my_stop_word section. But it is showing the match. I have studied that if you mention one analyzer at the time of indexing in the mapping field. then it takes that analyzer by default at the time of the query.
please help!
PUT /pandey
{
"settings":
{
"analysis":
{
"analyzer":
{
"my_analyzer":
{
"tokenizer": "standard",
"filter": [
"my_stemmer",
"english_stop",
"my_stop_word",
"lowercase"
]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
},
"english_stop":{
"type": "stop",
"stopwords": "_english_"
},
"my_stop_word": {
"type": "stop",
"stopwords": ["robot", "love", "affection", "play", "the"]
}
}
}
},
"mappings": {
"properties": {
"dialog": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
PUT pandey/_doc/1
{
"dailog" : "the boy is a robot. he is in love. i play cricket"
}
GET pandey/_search
{
"query": {
"match": {
"dailog": "the"
}
}
}
A small spelling mistake can lead to this.
You defined mapping for dialog but added document with field name dailog. the dynamic field mappings behavior of elastic will index it without error. we can disable it though.
So the query, "dailog": "the" will get the result using default analyzer.

Copy analyzed text into another field

We are trying to build a WordCloud over four Text-Fields. Each field has its own Stop Analyzer.
For example TextFr with a French Stop Analyzer, TextDe with a German Stop Analyzer. The analyzed result should be copied into another field called WordCloudText on which the aggregations takes place.
Do you have any advice how to do this? Is this even possible?
Thanks for your help
I don't think there is a way to copy the analyzed output of a field, only the values (unanalyzed) of a field. Probably the easiest way to achieve this is to define your own analyzer that filters all of four languages. Something like this:
PUT stackoverflow
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"dutch_stop": {
"type": "stop",
"stopwords": "_dutch_"
}
},
"analyzer": {
"eng_stop": {
"type": "stop",
"stopwords": "_english_"
},
"dutch_stop": {
"type": "stop",
"stopwords": "_dutch_"
},
"all_lang_stop": {
"tokenizer": "lowercase",
"filter": [
"english_stop",
"dutch_stop"
]
}
}
}
},
"mappings": {
"record": {
"properties": {
"field": {
"type": "keyword",
"fields": {
"english": {"type": "text", "analyzer": "eng_stop" },
"dutch": {"type": "text", "analyzer": "dutch_stop" },
"word_cloud": {"type": "text", "analyzer": "all_lang_stop"}
}
}
}
}
}
}
The key is the custom analyzer called all_lang_stop that combines multiples top filters. Then you use a multi-field to have your data automatically copied to into each type of stop analyzer.
Alternatively, if your text is already separated into different fields by language, you can use the copy_to directive on each individual language field to copy it into the word_cloud field. Note that copy_to copies the input value, not the output value of the analyzer, so you still need the combined analyzer. Something like this:
"mappings": {
"record": {
"properties": {
"english": {"type": "text", "analyzer": "eng_stop", copy_to: "word_cloud"},
"dutch": {"type": "text", "analyzer": "dutch_stop", copy_to: "word_cloud"},
"word_cloud": {"type": "text", "analyzer": "all_lang_stop"}
}
}
}

Why is my elastic search prefix query case-sensitive despite using lowercase filters on both index and search?

The Problem
I am working on an autocompleter using ElasticSearch 6.2.3. I would like my query results (a list of pages with a Name field) to be ordered using the following priority:
Prefix match at start of "Name" (Prefix query)
Any other exact (whole word) match within "Name" (Term query)
Fuzzy match (this is currently done on a different field to Name using a ngram tokenizer ... so I assume cannot be relevant to my problem but I would like to apply this on the Name field as well)
My Attempted Solution
I will be using a Bool/Should query consisting of three queries (corresponding to the three priorities above), using boost to define relative importance.
The issue I am having is with the Prefix query - it appears to not be lowercasing the search query despite my search analyzer having the lowercase filter. For example, the below query returns "Harry Potter" for 'harry' but returns zero results for 'Harry':
{ "query": { "prefix": { "Name.raw" : "Harry" } } }
I have verified using the _analyze API that both my analyzers do indeed lowercase the text "Harry" to "harry". Where am I going wrong?
From the ES documentation I understand I need to analyze the Name field in two different ways to enable use of both Prefix and Term queries:
using the "keyword" tokenizer to enable the Prefix query (I have applied this on a .raw field)
using a standard analyzer to enable the Term (I have applied this on the Name field)
I have checked duplicate questions such as this one but the answers have not helped
My mapping and settings are below
ES Index Mapping
{
"myIndex": {
"mappings": {
"pages": {
"properties": {
"Id": {},
"Name": {
"type": "text",
"fields": {
"raw": {
"type": "text",
"analyzer": "keywordAnalyzer",
"search_analyzer": "pageSearchAnalyzer"
}
},
"analyzer": "pageSearchAnalyzer"
},
"Tokens": {}, // Other fields not important for this question
}
}
}
}
}
ES Index Settings
{
"myIndex": {
"settings": {
"index": {
"analysis": {
"filter": {
"ngram": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "15"
}
},
"analyzer": {
"keywordAnalyzer": {
"filter": [
"trim",
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "keyword"
},
"pageSearchAnalyzer": {
"filter": [
"trim",
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "standard"
},
"pageIndexAnalyzer": {
"filter": [
"trim",
"lowercase",
"asciifolding",
"ngram"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "l2AXoENGRqafm42OSWWTAg",
"version": {}
}
}
}
}
Prefix queries don't analyze the search terms, so the text you pass into it bypasses whatever would be used as the search analyzer (in your case, the configured search_analyzer: pageSearchAnalyzer) and evaluates Harry as-is directly against the keyword-tokenized, custom-filtered harry potter that was the result of the keywordAnalyzer applied at index time.
In your case here, you'll need to do one of a few different things:
Since you're using a lowercase filter on the field, you could just always use lowercase terms in your prefix query (using application-side lowercasing if necessary)
Run a match query against an edge_ngram-analyzed field instead of a prefix query like described in the ES search_analyzer docs
Here's an example of the latter:
1) Create the index w/ ngram analyzer and (recommended) standard search analyzer
PUT my_index
{
"settings": {
"index": {
"analysis": {
"filter": {
"ngram": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "15"
}
},
"analyzer": {
"pageIndexAnalyzer": {
"filter": [
"trim",
"lowercase",
"asciifolding",
"ngram"
],
"type": "custom",
"tokenizer": "keyword"
}
}
}
}
},
"mappings": {
"pages": {
"properties": {
"name": {
"type": "text",
"fields": {
"ngram": {
"type": "text",
"analyzer": "pageIndexAnalyzer",
"search_analyzer": "standard"
}
}
}
}
}
}
}
2) Index some sample docs
POST my_index/pages/_bulk
{"index":{}}
{"name":"Harry Potter"}
{"index":{}}
{"name":"Hermione Granger"}
3) Run the a match query against the ngram field
POST my_index/pages/_search
{
"query": {
"match": {
"query": "Har",
"operator": "and"
}
}
}
I think it is better to use match_phrase_prefix query without using .keyword suffix. Check the docs at here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase-prefix.html

"Letter" tokenizer and "word_delimiter" filter not working with underscores

I built an ElasticSearch index using a custom analyzer which uses letter tokenizer and lower_case and word_delimiter token filters. Then I tried searching for documents containing underscore-separated sub-words, e.g. abc_xyz, using only one of the sub-words, e.g. abc, but it didn't come back with any result. When I tried the full-word, i.e. abc_xyz, it did find the document.
Then I changed the document to have dash-separated sub-words instead, e.g. abc-xyz and tried to search by sub-words again and it worked.
To try to understand what is going on, I thought I would check the terms generated for my documents using _termvector service, and the result was identical for both, the underscore-separated sub-words and the dash-separated sub-words, so really I expect the result of searching to be identical in both cases.
Any idea what I could be doing wrong?
If it helps, this is the settings I used for my index:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"cmt_value_analyzer": {
"tokenizer": "letter",
"filter": [
"lowercase",
"my_filter"
],
"type": "custom"
}
},
"filter": {
"my_filter": {
"type": "word_delimiter"
}
}
}
}
},
"mappings": {
"alertmodel": {
"properties": {
"name": {
"analyzer": "cmt_value_analyzer",
"term_vector": "with_positions_offsets_payloads",
"type": "string"
},
"productId": {
"type": "double"
},
"productName": {
"analyzer": "cmt_value_analyzer",
"term_vector": "with_positions_offsets_payloads",
"type": "string"
},
"link": {
"analyzer": "cmt_value_analyzer",
"term_vector": "with_positions_offsets_payloads",
"type": "string"
},
"updatedOn": {
"type": "date"
}
}
}
}
}

Searching synonyms in elasticsearch

I'm trying to create synonym search over languages indexed in ES.
For example,
Indexed document -> name: German
Synonyms: German, Deutsch, XYZ
What I want to make is, when I type either German or Deutsch or XYZ, that ES returns me German...
Is that possible at all?
Yes very much so. ElasticSearch handles synonyms very well. Here is an example of how I configured synonyms on my cluster -
curl -XPOST localhost:9200/**new-index** -d '{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0,
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
}
},
"analyzer": {
"synonym": {
"tokenizer": "lowercase",
"filter": [
"synonym"
]
}
}
}
},
"mappings": {
"**new-type**": {
"_all": {
"enabled": false
},
"properties": {
"Title": {
"type": "multi_field",
"store": "yes",
"fields": {
"Title": {
"type": "string",
"analyzer": "synonym"
}
}
}
}
}
}
}'
The path for the synonym file looks inside the config folder for the synonym folder and locates the text file. An example of the contents of the synonyms.txt for your requirements would be -
German, Deutsch, XYZ
REMEMBER - if you have a lower case filter at index time, the synonyms need to be in lower case. Restart nodes if not working.

Resources