Elastic synonyms are taking over other words - elasticsearch

On this sequence of commands:
Create the index:
PUT /test_index?
{
"settings": {
"analysis": {
"analyzer": {
"GermanCompoundWordsAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"german_compound_synonym",
"german_normalization"
]
}
},
"filter": {
"german_compound_synonym": {
"type": "synonym",
"synonyms": [
"teppichläufer, auslegware läufer"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "GermanCompoundWordsAnalyzer"
}
}
}
}
}
Adding a few documents:
POST test_index/_doc/
{
"sku" : "kimchy",
"name" : "teppichläufer alfa"
}
POST test_index/_doc/
{
"sku" : "kimchy",
"name" : "teppichläufer beta"
}
Search for one document (I would expect), but 2 are returning :(
GET /test_index/_search
{
"query": {
"match": {
"name": {
"query": "teppichläufer beta",
"operator": "and"
}
}
}
}
I will get both documents since the synonym teppichläufer, auslegware läufer, läufer will endup on the position 1 and 'substitute' the beta. If I remove the "analyzer": "GermanCompoundWordsAnalyzer", I will just get one document as expected.
How do I use this synonyms and don't have this issue?

POST /test_index/_search
{
"query": {
"bool" : {
"should": [
{
"query_string": {
"default_field": "name",
"query": "teppichläufer beta"
, "default_operator": "AND"
}
}
]
}
}
}

After a little more search I found it on the documentations. This a RFM problems, sorry guys.
I tried with:
https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-synonym-graph-tokenfilter.html
The funny part is that it makes the NDCG of the results worst :)

Related

Elasticsearch : using fuzzy search to find abbreviations

I have indexed textual articles which mentions company names, like apple and lemonade, and am trying to search for these companies using their abbreviations like APPL and LMND but fuzzy search is giving other results, for example, searching with LMND gives land which is mentioned in the text but it doesn't output lemonade whichever parameters I tried.
First question
Is fuzzy search the suitable solution for such search ?
Second question
what could be a good parameter values ranges to support my problem ?
UPDATE
I have tried synonym filter
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonyms_filter": {
"type": "synonym",
"synonyms": [
"apple,APPL",
"lemonade,LMND"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonyms_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"transcript_data": {
"properties": {
"words": {
"type": "nested",
"properties": {
"word": {
"type": "text",
"search_analyzer":"synonym_analyzer"
}
}
}
}
}
}
}
}
and for SEARCH I used
{
"_source": false,
"query": {
"nested": {
"path": "transcript_data.words",
"query": {
"match": {
"transcript_data.words.word": "lmnd"
}
}
}
}
}
but it's not working
I believe that the best option for you is the use of synonyms, they serve exactly what you need.
I'll leave an example and the link to an article explaining some details.
PUT teste
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonyms_filter": {
"type": "synonym",
"synonyms": [
"apple,APPL",
"lemonade,LMND"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonyms_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"transcript_data": {
"properties": {
"words": {
"type": "nested",
"properties": {
"word": {
"type": "text",
"analyzer":"synonym_analyzer"
}
}
}
}
}
}
}
}
POST teste/_bulk
{"index":{}}
{"transcript_data": {"words":{"word":"apple"}}}
GET teste/_search
{
"query": {
"nested": {
"path": "transcript_data.words",
"query": {
"match": {
"transcript_data.words.word": "appl"
}
}
}
}
}

ElasticSearch Search-as-you-type field type field with partial search

I recently updating my ngram implementation settings to use Search-as-you-type field type.
https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-as-you-type.html
This worked great but I noticed that partial searching does not work.
If I search for number 00060434 I get the desired result but I would also like to be able to search for 60434, then it should return document 3.
Is there a way todo it with the Search-as-you-type field type or can i only do this with ngrams?
PUT searchasyoutype_example
{
"settings": {
"analysis": {
"analyzer": {
"englishAnalyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"trim",
"ascii_folding"
]
}
},
"filter": {
"ascii_folding": {
"type": "asciifolding",
"preserve_original": true
}
}
}
},
"mappings": {
"properties": {
"number": {
"type": "search_as_you_type",
"analyzer": "englishAnalyzer"
},
"fullName": {
"type": "search_as_you_type",
"analyzer": "englishAnalyzer"
}
}
}
}
PUT searchasyoutype_example/_doc/1
{
"number" : "00069794",
"fullName": "Employee 1"
}
PUT searchasyoutype_example/_doc/2
{
"number" : "00059840",
"fullName": "Employee 2"
}
PUT searchasyoutype_example/_doc/3
{
"number" : "00060434",
"fullName": "Employee 3"
}
GET searchasyoutype_example/_search
{
"query": {
"multi_match": {
"query": "00060434",
"type": "bool_prefix",
"fields": [
"number",
"number._index_prefix",
"fullName",
"fullName._index_prefix"
]
}
}
}
I think you need to query on number,number._2gram & number._3gram like below:
GET searchasyoutype_example/_search
{
"query": {
"multi_match": {
"query": "00060434",
"type": "bool_prefix",
"fields": [
"number",
"number._2gram",
"number._3gram",
]
}
}
}
search_as_you_type creates the 3 sub fields. You can check more on this article how it works:
https://ashish.one/blogs/search-as-you-type/

How does phrase searching and phrase search with ~N interact with quote_field_suffix in a simple query string query?

For example, given:
PUT index
{
"settings": {
"analysis": {
"analyzer": {
"english_exact": {
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"body": {
"type": "text",
"analyzer": "english",
"fields": {
"exact": {
"type": "text",
"analyzer": "english_exact"
}
}
}
}
}
}
PUT index/_doc/1
{
"body": "Ski resorts"
}
PUT index/_doc/1
{
"body": "Ski house resorts"
}
What happens with the following queries?
{
"query": {
"simple_query_string": {
"fields": [ "body" ],
"quote_field_suffix": ".exact",
"query": "\"ski resort\""
}
}
}
{
"query": {
"simple_query_string": {
"fields": [ "body" ],
"quote_field_suffix": ".exact",
"query": "\"ski resort\"~2"
}
}
}
Will the ".exact" extend to the entire phrase, so in this case the first query would get no results?
How could you do a phrase search that is not exact when using quote "quote_field_suffix": ".exact"?
Will the ".exact" extend to the entire phrase, so in this case the first query would get no results?
Yes, Your understanding is correct.
Documentation says, Suffix appended to quoted text in the query string.
So, it will search for exact match for ski resort. It is not there so it will return empty result.
How could you do a phrase search that is not exact when using quote "quote_field_suffix": ".exact"?
{
"query": {
"simple_query_string": {
"fields": [ "body" ],
"quote_field_suffix": ".exact",
"query": "ski resort~2"
}
}
}
It is not exact because it brings ski resorts also.

How do prioritize matches in the beginning of strings in Elasticsearch?

I have an Elasticsearch instance full of documents containing movie and series titles.
When I run this:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": [
"Name^2",
"SeriesName^1.5",
"Description"
],
"fuzziness": "AUTO",
"prefix_length": 2,
"query": "game"
}
}
]
}
}
}
... I get titles like "The big game", "Hunger games", "War game", etc.
However, I would like to get titles starting with "game" BEFORE titles just containing "game".
When a user searches for "game", they expect titles like "Game of Thrones" and "Game change", before "The imitation game".
How can I make this more precise? Thank you!
Try something like below :
{ "query": {
"prefix" : { "Name" : "game" }
}
}
Please refer the documentation for the same Elasticsearch Documentation
To do this your field/property have to be tokenized as a keyword, see query below. One can also add an additional lowercase filter in mapping for your field/property.
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_startswith": {
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
},
"mappings": {
"test_index": {
"properties": {
"Name": {
"search_analyzer": "analyzer_startswith",
"index_analyzer": "analyzer_startswith",
"type": "string"
}
}
}
}
}

Elasticsearch index search for currency $ and £ signs

In some of my documents I have $ or £ symbols. I want to search for £ and retrieve documents containing that symbol. I've gone through the documentation but I'm getting some cognitive dissonance.
# Delete the `my_index` index
DELETE /my_index
# Create a custom analyzer
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": [
"&=> and ",
"$=> dollar "
]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [
"html_strip",
"&_to_and"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
}
}
This returns "the", "quick", "and", "brown", "fox" just as the documentation states:
# Test out the new analyzer
GET /my_index/_analyze?analyzer=my_analyzer&text=The%20quick%20%26%20brown%20fox
This returns "the", "quick", "dollar", "brown", "fox"
GET /my_index/_analyze?analyzer=my_analyzer&text=The%20quick%20%24%20brown%20fox
Adding some records:
PUT /my_index/test/1
{
"title": "The quick & fast fox"
}
PUT /my_index/test/1
{
"title": "The daft fox owes me $100"
}
I would have thought if I search for "dollar", I would get a result? Instead I get no results:
GET /my_index/test/_search
{ "query": {
"simple_query_string": {
"query": "dollar"
}
}
}
Or even using '$' with an analyzer:
GET /my_index/test/_search
{ "query": {
"query_string": {
"query": "dollar10",
"analyzer": "my_analyzer"
}
}
}
Your problem is that you specify a custom analyzer but you never use that. If you use term vertors you can verify that. So follow that steps:
When creating and index set custom analyzer for the `title field:
GET /my_index
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": [
"&=> and ",
"$=> dollar "
]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [
"html_strip",
"&_to_and"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
}, "mappings" :{
"test" : {
"properties" : {
"title" : {
"type":"string",
"analyzer":"my_analyzer"
}
}
}
}
}
Insert data:
PUT my_index/test/1
{
"title": "The daft fox owes me $100"
}
Check for term vectors:
GET /my_index/test/1/_termvectors?fields=title
Response:
{
"_index":"my_index",
"_type":"test",
"_id":"1",
"_version":1,
"found":true,
"took":3,
"term_vectors":{
"title":{
"field_statistics":{
"sum_doc_freq":6,
"doc_count":1,
"sum_ttf":6
},
"terms":{
"daft":{
"term_freq":1,
"tokens":[
{
"position":1,
"start_offset":4,
"end_offset":8
}
]
},
"dollar100":{ <-- You can see it here
"term_freq":1,
"tokens":[
{
"position":5,
"start_offset":21,
"end_offset":25
}
]
},
"fox":{
"term_freq":1,
"tokens":[
{
"position":2,
"start_offset":9,
"end_offset":12
}
]
},
"me":{
"term_freq":1,
"tokens":[
{
"position":4,
"start_offset":18,
"end_offset":20
}
]
},
"owes":{
"term_freq":1,
"tokens":[
{
"position":3,
"start_offset":13,
"end_offset":17
}
]
},
"the":{
"term_freq":1,
"tokens":[
{
"position":0,
"start_offset":0,
"end_offset":3
}
]
}
}
}
}
}
Now search:
GET /my_index/test/_search
{
"query": {
"match": {
"title": "dollar100"
}
}
}
That will find the match. But searching with query string as:
GET /my_index/test/_search
{ "query": {
"simple_query_string": {
"query": "dollar100"
}
}
}
won't find anything. Because it searches special _all field. And as I can see it aggregates fields as they are not analyzed:
GET /my_index/test/_search
{
"query": {
"match": {
"_all": "dollar100"
}
}
}
does not find a result. But:
GET /my_index/test/_search
{
"query": {
"match": {
"_all": "$100"
}
}
}
finds. I am not sure but the reason for that can be that the default analyzer is not the custom analyzer. To set a custom analyzer as default check:
Changing the default analyzer in ElasticSearch or LogStash
http://elasticsearch-users.115913.n3.nabble.com/How-we-can-change-Elasticsearch-default-analyzer-td4040411.html
http://grokbase.com/t/gg/elasticsearch/148kwsxzee/overriding-built-in-analyzer-and-set-it-as-default
http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-default-analyzer-td3935275.html

Resources