Weird results from Elastic search when using a synonym analyzer

Weird results from Elastic search when using a synonym analyzer - elasticsearch

When i run this query
"multi_match": {
"query": "paper copier ",
"fields": [ "allStringFields" ],
"type": "cross_fields",
"operator": "and",
"analyzer": "synonym"
}
i get 1342 results
But when i run this query (notice word order)
"multi_match": {
"query": " copier paper ",
"fields": [ "allStringFields" ],
"type": "cross_fields",
"operator": "and",
"analyzer": "synonym"
}
I get zero results
I am using synonym analyzer and it is the cause for this behavior
Is there a solution to this ?

Adding a working example with index data, mapping, search query, and search result. In the below example, I have taken two synonyms table and tables
I get zero results
Please go through your index mapping once again. According to the below example, the search keyword is table chair, this is to be searched in both the fields title and content. The below query will return the documents that contain both table AND chair. To get a detailed explanation, refer to ES documentation on the Multi match query and synonym token filter.
Index Mapping:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"table, tables"
]
}
},
"analyzer": {
"synonym_analyzer": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text"
}
}
}
}
Index Data:
{ "title": "table chair" }
{ "title": "tables chair" }
{ "title": "table fan" }
{ "title": "light fan", "content": "chair" }
Search Query:
{
"query": {
"multi_match": {
"query": "table chair",
"operator": "and",
"type":"cross_fields",
"fields": [
"title","content"
],
"analyzer": "synonym_analyzer"
}
}
}
Search Result:
"hits": [
{
"_index": "synonym",
"_type": "_doc",
"_id": "1",
"_score": 1.7227666,
"_source": {
"title": "table chair"
}
},
{
"_index": "synonym",
"_type": "_doc",
"_id": "2",
"_score": 1.3862942,
"_source": {
"title": "tables chair"
}
}
]
Searching table chair or chair table, gives the same search result as shown above.

Related

Elasticsearch searching for exact tags

Let's say I have the following documents
doc1: "blue water"
doc2: "extra blue water"
doc3: "blue waters"
I'm looking for a way to handle the following scenarios
If a user searches for "blue water" I want him to receive doc1 and doc3 (meaning that it will ignore doc2 and will also have an analyzers that will be able to stem tokens like in doc3).
If I'm using query_string, for example, I'm receiving doc2 as well as doc1 and doc3.

You can use stemmer along with the percolate query
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"tags": {
"type": "text",
"analyzer": "my_analyzer"
},
"query": {
"type": "percolator"
}
}
}
}
Index Data:
{
"query": {
"match_phrase": {
"tags": {
"query": "blue waters",
"analyzer": "my_analyzer"
}
}
}
}
{
"query": {
"match_phrase": {
"tags": {
"query": "extra blue water",
"analyzer": "my_analyzer"
}
}
}
}
{
"query": {
"match_phrase": {
"tags": {
"query": "blue water",
"analyzer": "my_analyzer"
}
}
}
}
Search Query:
{
"query": {
"percolate": {
"field": "query",
"document": {
"tags": "blue water"
}
}
}
}
Search Result:
"hits": [
{
"_index": "67671916",
"_type": "_doc",
"_id": "3",
"_score": 0.26152915,
"_source": {
"query": {
"match_phrase": {
"tags": {
"query": "blue waters",
"analyzer": "my_analyzer"
}
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
},
{
"_index": "67671916",
"_type": "_doc",
"_id": "1",
"_score": 0.26152915,
"_source": {
"query": {
"match_phrase": {
"tags": {
"query": "blue water",
"analyzer": "my_analyzer"
}
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]

You could use prefix search in this case. If you look for blue water, then as per prefix search it will give doc1 and doc3.
For prefix search :
{
"query": {
"prefix":{
"doc": word
}
}
}
Here word = blue water
You can have a look at this link.

Elasticsearch - How to specify the same analyzer for search and index

I'm working on a Spanish search engine. (I don't speak Spanish) But based on my research, the goal is more or less like this: 1. filter stopwords like "dos","de","la"... 2. stem the words for both search and index. e.g If you search "primera", then "primero","primer" should also show up.
My attempt:
es_analyzer={
"settings": {
"analysis": {
"filter": {
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_stemmer": {
"type": "stemmer",
"language": "spanish"
}
},
"analyzer": {
"default_search": {
"type": "spanish"
},
"rebuilt_spanish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_stemmer"
]
}
}
}
}
}
The problem:
When I use "type":"spanish" in the "default_search", my query "primera" gets stemmed to "primer", which is correct, but even though I specified to use "spanish_stemmer" in the filter, the documents in the index aren't stemmed. So as a result when I search for "primera", it only shows exact matches for "primer". Any suggestions on fixing this?
Potential fix but I haven't figured out the syntax:
Using built-in "spanish" analyzer in filter. What's the syntax?
Adding spanish stemmer and stopwords in "default_search". But I don't know how to use compound settings there.

Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"settings": {
"analysis": {
"filter": {
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_stemmer": {
"type": "stemmer",
"language": "spanish"
}
},
"analyzer": {
"default_search": {
"type":"spanish",
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_stemmer"
]
}
}
}
},
"mappings":{
"properties":{
"title":{
"type":"text",
"analyzer":"default_search"
}
}
}
}
Index Data:
{
"title": "primer"
}
{
"title": "primera"
}
{
"title": "primero"
}
Search Query:
{
"query":{
"match":{
"title":"primer"
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64420517",
"_type": "_doc",
"_id": "3",
"_score": 0.13353139,
"_source": {
"title": "primer"
}
},
{
"_index": "stof_64420517",
"_type": "_doc",
"_id": "1",
"_score": 0.13353139,
"_source": {
"title": "primera"
}
},
{
"_index": "stof_64420517",
"_type": "_doc",
"_id": "2",
"_score": 0.13353139,
"_source": {
"title": "primero"
}
}
]

Elasticsearch multi fields multi words match

I'm looking to implement an auto-complete like feature on my app with elasticsearch.
Let's say my input is "ronan f", I want elastic to return all elements where "ronan" or "f" is contained in last name or first name. I expect elasticsearch to sort the result by rank, so the element which is the closest to what I search should be on top.
I tried multiple requests but none of them results as expected.
For example :
{
"query": {
"bool": {
"must_not": [
{
"match": {
"email": "*#guest.booking.com"
}
}
],
"should": [
{
"match": {
"lastname": "ronan"
}
},
{
"match": {
"firstname": "ronan"
}
},
{
"match": {
"lastname": "f"
}
},
{
"match": {
"firstname": "f"
}
}
],
"minimum_should_match" : 1
}
},
"sort": [
"_score"
],
"from": 0,
"size": 30
}
With this request the ranks seams a bit odds, for example :
"_index": "clients",
"_type": "client",
"_id": "4369",
"_score": 20.680058,
"_source": {
"firstname": "F",
"lastname": "F"
}
is on top of :
"_index": "clients",
"_type": "client",
"_id": "212360",
_score": 9.230003,
"_source": {
"firstname": "Ronan",
"lastname": "Fily"
}
For me the second result should have a better rank than the first.
Can someone show me how can I achieve the result I want ?
For info, I can't use Completion Suggester functionality of elasticsearch because I can't access the configuration of the database (so no indexes).

Ok as you can reindex your data i join a "start with" anylyzer. It will work caseless & on text field (i thinck first name and last name can have multi words on it).
Delete / create a new index using mappings.
define your analyzer (PUT my_index)
{
"settings": {:
"filter": {
"name_ngrams": {
"max_gram": "20",
"type": "edgeNGram",
"min_gram": "1",
"side": "front"
}
},
"analyzer": {
"partial_name": {
"type": "custom",
"filter": [
"lowercase"
,
"name_ngrams"
,
"standard"
,
"asciifolding"
],
"tokenizer": "standard"
},
"full_name": {
"type": "custom",
"filter": [
"standard"
,
"lowercase"
,
"asciifolding"
],
"tokenizer": "standard"
}
}
post _mappings using this for your fields:
"lastname": {
"type": "text",
"analyzer": "partial_name",
"search_analyzer": "full_name"
},
"firstname": {
"type": "text",
"analyzer": "partial_name",
"search_analyzer": "full_name"
}
if this is not clear and elasticsearch documentation couldnot help you dont hesite to ask us.

Implementing search using Elasticsearch

I am currently implementing elasticsearch in my application. Please assume that "Hello World" is the data which we need to search. Our requirement is that we should get the result by entering "h" or "Hello World" or "Hello Worlds" as the keyword.
This is our current query.
{
"query": {
"wildcard" : {
"message" : {
"title" : "h*"
}
}
}
}
By using this we are getting the right result using the keyword "h". But we need to get the results in case of small spelling mistakes also.

You need to use english analyzer which stemmed tokens to its root form. More info can be found here
I implemented it by taking your example data, query and expected results using the edge n-gram analyzer and match query.
Index Mapping
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "english"
}
}
}
}
Index document
{
"title" : "Hello World"
}
Search query for h and its result
{
"query": {
"match": {
"title": "h"
}
}
}
"hits": [
{
"_index": "so-60524477-partial-key",
"_type": "_doc",
"_id": "1",
"_score": 0.42763555,
"_source": {
"title": "Hello World"
}
}
]
Search query for Hello Worlds and same document comes in result
{
"query": {
"match": {
"title": "Hello worlds"
}
}
}
Result
"hits": [
{
"_index": "so-60524477-partial-key",
"_type": "_doc",
"_id": "1",
"_score": 0.8552711,
"_source": {
"title": "Hello World"
}
}
]

EdgeNGrams or NGrams have better performance than wildcards. For wild card all documents have to be scanned to see which match the pattern. Ngrams break a text in small tokens.
Ex Quick Foxes will stored as [ Qu, Qui, Quic, Quick, Fo, Fox, Foxe, Foxes ] depending on min_gram and max_gram size.
Fuzziness can be used to find similar terms
Mapping
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"text":{
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Query
GET my_index/_search
{
"query": {
"match": {
"text": {
"query": "hello worlds",
"fuzziness": 1
}
}
}
}

Elasticsearch Ngrams: Unexpected behavior for autocomplete

Here's a simplification of what I have:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
PUT my_index/_doc/1
{
"title": "Quick Foxes"
}
PUT my_index/_doc/2
{
"title": "Quick Fuxes"
}
PUT my_index/_doc/3
{
"title": "Foxes Quick"
}
PUT my_index/_doc/4
{
"title": "Foxes Slow"
}
I am trying to search for Quick Fo to test the autocomplete:
GET my_index/_search
{
"query": {
"match": {
"title": {
"query": "Quick Fo",
"operator": "and"
}
}
}
}
The problem is that this query also returns Foxes Quick where I expected 'Quick Foxes'
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5753642,
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"title": "Quick Foxes"
}
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "3",
"_score": 0.5753642,
"_source": {
"title": "Foxes Quick" <<<----- WHY???
}
}
]
}
}
What can I tweak so that I can query a classic "autocomplete" where "Quick Fo" surely won't return "Foxes Quick"..... but only "Quick Foxes"?
---- ADDITIONAL INFO -----------------------
This worked for me:
PUT my_index1
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"text": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
}
PUT my_index1/_doc/1
{
"text": "Quick Brown Fox"
}
PUT my_index1/_doc/2
{
"text": "Quick Frown Fox"
}
PUT my_index1/_doc/3
{
"text": "Quick Fragile Fox"
}
GET my_index1/_search
{
"query": {
"match": {
"text": {
"query": "quick br",
"operator": "and"
}
}
}
}

Issue is due to your search analyzer autocomplete_search, in which you are using the lowercase tokenizer, so your search term Quick Fo will be divided into 2 terms, quick and fo (note lowercase) and will be matched against the tokens generated using the autocomplete analyzer on your indexed docs.
Now title Foxes Quick uses autocomplete analyzer and will be having both quick and fo tokens, hence it matches with the search term tokens.
you can simply use the _analyzer API, to check the tokens generated for your documents and as well as for your search term, to understand it better.
Please refer official ES doc https://www.elastic.co/guide/en/elasticsearch/guide/master/_index_time_search_as_you_type.html on how to implement the autocomplete, they also use different search time analyzer, but there is a certain limitation to it and can't solve all the use-cases(esp. if you have docs like yours), hence I implemented it using some other design, which is based on the business requirements.
Hope I was clear on explaining why it's returning the second doc in your case.
EDIT: Also in your case, IMO Match phrase prefix would be more useful.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Weird results from Elastic search when using a synonym analyzer - elasticsearch

Related

Elasticsearch searching for exact tags

Elasticsearch - How to specify the same analyzer for search and index

Elasticsearch multi fields multi words match

Implementing search using Elasticsearch

Elasticsearch Ngrams: Unexpected behavior for autocomplete

Categories

Resources