Elasticsearch exclude "stop" words from highlight - elasticsearch

I want to exclude the default stop words from being highlighted but I'm not sure why this isn't working
ES config
"settings": {
"analysis": {
"analyzer": {
"search_synonyms": {
"tokenizer": "whitespace",
"filter": [
"graph_synonyms",
"lowercase",
"asciifolding",
"stop"
],
}
},
"filter": {
"graph_synonyms": {
...
}
},
"normalizer": {
"normalizer_1": {
...
}
}
}
},
Fields mapping:
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "search_synonyms"
},
"narrative": {
"type":"object",
"properties":{
"_all":{
"type": "text",
"analyzer": "search_synonyms"
}
}
},
"originator": {
"type": "keyword",
"normalizer": "normalizer_1"
},
................
}
}
Highlight query:
highlight : {
fields:{
"*":{}
}
},
Currently I'm getting stop words such as this, A, IS highlighted within narrative fields and I want to prevent that.

Related

How to apply multiple settings in index in elasticsearch

I need to have two settings
One is stopwords settings, second is synonym settings.
How to add different settings applied on one index
Below is stopwords setting which i need to apply on the index
settings_1 = {
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": "my_fil"
}
},
"filter": {
"my_fil": {
"type": "stop",
"stopwords_path": "st.txt",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_stop_analyzer"
}
}
}
}
Below is synonym setting which i need to apply on the index
settings_2 = {
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_filter"
]
}
},
"filter": {
"my_filter": {
"type": "synonym",
"synonyms_path": "sy.txt",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_analyzer"
}
}
}
}
Will the code work like below
es.indices.put_settings(index="gene", body=settings_1)
es.indices.put_settings(index="gene", body=settings_2)
Although you can use the two different update setting like you mentioned but this is not a preferred way 1) it involves two network call to Elasticsearch 2) this can be combined in a single call and it will have less overhead at Elasticsearch to update the cluster state to all the nodes.
You can just combine both the settings and send single update setting request. you can first test this in Postman or kibana dev tools with JSON format.
As discussed in the comment, below is the complete setting and mapping part combining two settings(which defines two analyzer)
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": "my_fil"
},
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_filter"
]
}
},
"filter": {
"my_fil": {
"type": "stop",
"stopwords_path": "analyzers/<your analyzer ID>",
"updateable": true
},
"my_filter": {
"type": "synonym",
"synonyms_path": "analyzers/F111111111",
"updateable": true
}
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "my_stop_analyzer"
}
}
}
}

How I can implement synonyms in elastic search?

I want to implement synonyms in my mapping. I have created parent-child mapping. Here is my mapping.
{
"mapping":{
"mappings":{
"question_data":{
"properties":{
"question_id":{
"type":"integer"
},
"question":{
"type":"string"
}
}
},
"answer_data":{
"_parent":{
"type":"question_data"
},
"_routing":{
"required":true
},
"properties":{
"answer_id":{
"type":"integer"
},
"answer":{
"type":"string",
}
}
}
}
}
}
Thanks in advance.
To use synonyms in elasticsearch you have to first create a synonym analyzer in settings to add synonym support for a particular field. Also in the settings you can define synonyms also.
PUT testindex_510
{
"settings": {
"analysis": {
"analyzer": {
"synonymanalyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "locationsynfilter"]
},
"synonymanalyzer1": {
"tokenizer": "standard",
"filter": ["lowercase", "titlesynfilter"]
}
},
"filter": {
"locationsynfilter": {
"type": "synonym",
"synonyms": [
"lokhandwala,andheri west",
"versova,andheri west",
"mazgaon,byculla"
]
},
"titlesynfilter": {
"type": "synonym",
"synonyms": [
"golds , gold",
"talwalkars, talwalkar"
]
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"title": {
"type": "string",
"analyzer": "synonymanalyzer1"
},
"location": {
"type": "string",
"analyzer": "synonymanalyzer"
}
}
}
}
}
In the above settings i defined two analyzer for two different fields. These analyzers support synonms and also define the synonyms in the filter for each analyzer.
You also define synonyms in a seperate txt file instead of defining in the mappings like the following one.
{
"settings": {
"analysis": {
"analyzer": {
"synonymanalyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "locationsynfilter"]
},
"synonymanalyzer1": {
"tokenizer": "standard",
"filter": ["lowercase", "titlesynfilter"]
}
},
"filter": {
"titlesynfilter": {
"type": "synonym",
"synonyms_path": "analysis/titlesynonym.txt"
},
"locationsynfilter": {
"type": "synonym",
"synonyms_path": "analysis/locationsynonym.txt"
}
}
}
},
"mappings": {
"testtype": {
"properties": {
"title": {
"type": "string",
"analyzer": "synonymanalyzer1"
},
"location": {
"type": "string",
"analyzer": "synonymanalyzer"
}
}
}
}
}
where your txt file should look like. Please refer documentation for more configuration.
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
Hope this helps

Using Elasticsearch to search special characters

How can I force Elasticsearch query_string to recognize '#' as a simple character?
Assuming I have an Index, and I added a few documents, by this statement:
POST test/item/_bulk
{"text": "john.doe#gmail.com"}
{"text": "john.doe#outlook.com"}
{"text": "john.doe#gmail.com, john.doe#outlook.com"}
{"text": "john.doe[at]gmail.com"}
{"text": "john.doe gmail.com"}
I want this search:
GET test/item/_search
{
"query":
{
"query_string":
{
"query": "*#gmail.com",
"analyze_wildcard": "true",
"allow_leading_wildcard": "true",
"default_operator": "AND"
}
}
}
to return only the first and third documents.
I tried 3 kinds of mapping:
First i tried -
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"email_analyzer": {
"tokenizer": "email_tokenizer"
}
},
"tokenizer": {
"email_tokenizer": {
"type": "uax_url_email"
}
}
}
},
"mappings": {
"item": {
"properties": {
"text": {
"type": "string",
"analyzer": "email_analyzer"
}
}
}
}
}
than i tried -
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "whitespace"
}
}
}
},
"mappings": {
"item": {
"properties": {
"text": {
"type": "string",
"analyzer": "my_analyzer"
}
}
}
}
}
and i also tried this one -
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "whitespace"
}
}
}
},
"mappings": {
"item": {
"properties": {
"text": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
None of the above worked, actually they all returned all the documents.
Is there an analyzer/tokenizer/parameter that will make Elasticsearch to acknowledge the '#' sign like it does with any other character
This is working with your last setting, by putting the text to not analyze:
GET test/item/_search
{
"query":
{
"wildcard":
{
"text": "*#gmail.com*"
}
}
}
When using not analyzed field, you should use Term level query but not Full-Text level query: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/term-level-queries.html

ElasticSearch cannot find analyzer in field?

I create an index like this using a PUT http://localhost:9200/test :
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"sortable": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
}
}
This returned:
{"acknowledged":true}
Then make sure that the analyzer is there:
http://localhost:9200/test/_analyze?_analyzer=sortable&text=HeLLo
{"tokens":[{"token":"hello","start_offset":0,"end_offset":5,"type":"<ALPHANUM>","position":0}]}
So I create mappings for it:
By PUT http://localhost:9200/test/_mapping/company
{
"properties": {
"name": {
"type": "string",
"analyzer": "standard",
"fields": {
"raw": {
"type": {
"analyzer": "sortable"
}
}
}
}
}
This returns:
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"no handler for type [{analyzer=sortable}] declared on field [raw]"}],"type":"mapper_parsing_exception","reason":"no handler for type [{analyzer=sortable}] declared on field [raw]"},"status":400}
What is wrong?
Your company mapping needs to be fixed to this:
{
"properties": {
"name": {
"type": "string",
"analyzer": "standard",
"fields": {
"raw": {
"type": "string",
"analyzer": "sortable"
}
}
}
}

Implementing multiple synonym_path for single index in elastic search

I am trying to achieve multiple synonym_path for a single index in elasticsearch.
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter": {
"bool": {
"should": [{
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
}},
{
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms2.txt",
"ignore_case": true
}}]
}
}
}
}
},
"mappings": {
"animals": {
"properties": {
"name": {
"type": "String",
"analyzer": "synonym"
}
}
}
}
I tried the snippet above using JSON Sense in Chrome but it generated a TokenFilter [bool] must have a type associated with it error.
Is there other way to implement it?
The filter section in the analysis section is not meant to contain the Query DSL but token filter definitions.
In your case, you need to re-create your index with the following settings:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonyms": {
"tokenizer": "whitespace",
"filter": [
"synonym1",
"synonym2"
]
}
},
"filter": {
"synonym1": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
},
"synonym2": {
"type": "synonym",
"synonyms_path": "synonyms2.txt",
"ignore_case": true
}
}
}
}
},
"mappings": {
"animals": {
"properties": {
"name": {
"type": "string",
"analyzer": "synonyms"
}
}
}
}
}

Resources