How does type ahead in ElasticSearch work on multiple words and partial text match - elasticsearch

I would like to explain with an example.
Documents of my ElasticSearch dataset has a field 'product_name'.
One document has product_name = 'Anmol Twinz Biscuit"
When the user types (a)'Anmol Twin' or (b)'Twin Anmol' or (c)'Twinz Anmol' or (d) Anmol Twinz, I want this specific record returned as search result.
However, this works only if I specify the complete words in the search query. Partial matches are not working. Thus (a) & (b) is not returning the desired result.
Mapping defined (obtained by _mapping query)
{
"sbis_product_idx": {
"mappings": {
"items": {
"properties": {
"category_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_company": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_id": {
"type": "long"
},
"product_name": {
"type": "text"
},
"product_price": {
"type": "float"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
Query being used:
{
"_source": "product_name",
"query": {
"multi_match" : {
"type": "best_fields",
"query": "Twin Anmol",
"fields": [ "product_name", "product_company" ],
"operator": "and"
}
}
}
The document in ES
{
"_index": "sbis_product_idx",
"_type": "misc",
"_id": "107996",
"_version": 1,
"_score": 0,
"_source": {
"suggest": {
"input": [
"Anmol",
"Twinz",
"Biscuit"
]
},
"category_name": "Other Product",
"product_company": "Anmol",
"product_price": 30,
"product_name": "Anmol Twinz Biscuit",
"product_id": 107996
}
}
Result
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
Mistake in query / mapping?

I just created the index with your mapping and indexed ES doc given in your example and just changed the operator in your query from and to or and it's giving me all result for all 4 query combinations.
Find below my query
{
"_source": "product_name",
"query": {
"multi_match" : {
"type": "best_fields",
"query": "Anmol Twinz",
"fields": [ "product_name", "product_company" ],
"operator": "or" --> changed it to `or`
}
}
}
With and operator your query tries to find both terms in your search query, some of which are not complete token like Twin in ES, hence you were not getting results for them, when you change your operator to or then if any of the token present, it will match.
Note:- if you want to match on partial tokens like Twin or Twi then, you need to use the n-gram tokens as explained in official ES doc https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html and its a completely different design.

Related

add fuzziness to elasticsearch query

I have a query for an autocomplete/suggestions index that looks like this:
{
"size": 10,
"query": {
"multi_match": {
"query": "'"+search_text+"'",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
This query works exactly as I want it to. However I want to add fuzziness:"AUTO" to this query. I read the documentation and tried adding it like this:
{
"size": 10,
"query": {
"multi_match": {
"query": {
"fuzzy": {
"value": "'"+search_text+"'",
"fuzziness": "AUTO"
}
},
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
But I get a this error
```
"type": "parsing_exception",
"reason": "[multi_match] unknown token [START_OBJECT] after [query]",```
This is causing my query not to work.
There is no need to add a fuzzy query. To add fuzziness to a multi-match query you need to add the fuzziness property as described here :
Since you are using bool_prefix as the type of multi-match query, so it creates a match_bool_prefix on each field that analyzes its input and constructs a bool query from the terms. Each term except the last is used in a term query. The last term is used in a prefix query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"serviceTitle": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"services": {
"type": "search_as_you_type",
"max_shingle_size": 3
}
}
}
}
Index Data:
{
"company_name":"sequencing how shingles are actually used"
}
Search Query:
{
"size": 10,
"query": {
"multi_match": {
"query": "sequensing how shingles",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
],
"fuzziness":"auto"
}
}
}
Search Result:
"hits": [
{
"_index": "65153201",
"_type": "_doc",
"_id": "1",
"_score": 1.5465959,
"_source": {
"company_name": "sequencing how shingles are actually used"
}
}
]
If you want to query sequensing, and get the above document, then you need to change the type of multi-match from bool_prefix to another type according to your use case.

No match on document if the search string is longer than the search field

I have a title I am looking for
The title is, and is stored in a document as
"Police diaries : stefan zweig"
When I search "Police"
I get the result.
But when I search Policeman
I do not get the result.
Here is the query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"title",
omitted because irrelevance...
],
"query": "Policeman",
"fuzziness": "1.5",
"prefix_length": "2"
}
}
],
"must": {
omitted because irrelevance...
}
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
and here is the mapping
{
"books": {
"mappings": {
"book": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"sort": {
"type": "text",
"analyzer": "to order in another language, (creates a string with symbols)",
"fielddata": true
}
}
}
}
}
}
}
}
It should be noted that I have documents with a title "some title"
which get hits if I search for "someone title".
I cant figure out why the police book is not showing up.
So you have 2 parts of your question.
You want to search the title containing police when searching for policeman.
want to know why some title documents match the someone title document and according to that you expect the first one to match as well.
Let me first explain you why second query matches and the why the first one doesn't and then would tell you, how to make the first one to work.
Your document containing some title creates below tokens and you can verify this with analyzer API.
POST /_analyze
{
"text": "some title",
"analyzer" : "standard" --> default analyzer for text field
}
Generated tokens
{
"tokens": [
{
"token": "some",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "title",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Now when you search for someone title using the match query which is analyzed and uses the same analyzer which is used on index time on field.
So it creates 2 tokens someone and title and match query matches the title tokens, which is the reason it comes in your search result, you can also use Explain API to verify and see the internals how it matches in detail.
How to bring police title when searching for policeman
You need to make use of synonyms token filter as shown in the below example.
Index Def
{
"settings": {
"analysis": {
"analyzer": {
"synonyms": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms" : ["policeman => police"] --> note this
}
}
}
},
"mappings": {
"properties": {
"": {
"type": "text",
"analyzer": "synonyms"
}
}
}
}
Index sample doc
{
"dialog" : "police"
}
Search query having term policeman
{
"query": {
"match" : {
"dialog" : {
"query" : "policeman"
}
}
}
}
And search result
"hits": [
{
"_index": "so_syn",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"dialog": "police" --> note source has `police` only.
}
}
]

Some Elastic fields DSL query searchable and some not

I'm using Elastic Search 6.8.1 and Dynamic Mapping. I have one document in the index now, and am testing out searching on various fields. I make a post to http://localhost:9200/documents/_search and send a DSL query
{
"query":
{"bool":{"must":{"term":{"name": "item2"}}} }
}
and I get the document I expect:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "nRMOs5DZg",
"_score": 0.2876821,
"_source": {
"freeform": "DEF",
"name": "item2",
"url": "s3://mybucket/key",
"visible": true
}
}
]
}
}
Now, I want to make sure that I can search on the "freeform" field by changing the query to
{
"query":
{"bool":{"must":{"term":{"freeform": "DEF"}}} }
}
This results in no hits and I can't understand why.
[EDIT]
Here is the dynamic mapping
{
"documents": {
"aliases": {},
"mappings": {
"document": {
"properties": {
"freeform": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"visible": {
"type": "boolean"
}
}
}
},
"settings": {
"index": {
"creation_date": "1564776393764",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "2er2TF-ySEKgk6gd32K6Ig",
"version": {
"created": "6080199"
},
"provided_name": "documents"
}
}
}
}
It's hard to answer without seeing your mapping, but my guess would be this:
The dynamic mapping tries to guess the data type to assign to your fields; the default for string fields is the "text" data type, which means their value is analyzed and stored as a list of normalized terms, which is useful for free-text search. The string "item2" happens to survive this analysis unchanged, but "DEF" would be analyzed to "def".
Since you're using a term query, the queried term doesn't go through the same analysis process, so you have to query using the analyzed term in order to match the document.
Try searching for "def" instead of "DEF" to test this hypothesis. Also, take a look at the automatically-generated mapping for your index and you'll see which data type each field was mapped to.
If this is indeed the case, you can do one of several things:
If you want exact-string matching: change the mapping from text to keyword (you can control dynamic mapping using Dynamic Templates); or alternatively search using the keyword sub-field which is created automatically for you by searching against freeform.raw instead of freeform.
If you want "free-text" matching: use a match query instead of a term query so both the input and the document value undergo the same analysis (but make sure you understand how analysis and match queries work).

Elasticsearch: Why can't I use "5m" for precision in context queries?

I'm running on Elasticsearch 5.5
I have a document with the following mapping
"mappings": {
"shops": {
"properties": {
"locations": {
"type": "geo_point"
},
"name": {
"type": "keyword"
},
"suggest": {
"type": "completion",
"contexts": [
{
"name": "location",
"type": "GEO",
"precision": "10m",
"path": "locations"
}
]
}
}
}
I'll add a document as follows:
PUT my_index/shops
{
"name":"random shop",
"suggest":{
"input":"random shop"
},
"locations":[
{
"lat":42.38471212,
"lon":-71.12612357
}
]
}
I try to query for the document with the follow JSON call
GET my_shops/_search
{
"suggest": {
"result": {
"prefix": "random",
"completion": {
"field": "suggest",
"size": 5,
"fuzzy": true,
"contexts": {
"location": [{
"lat": 42.38471212,
"lon": -71.12612357,
"precision": "10mi"
}]
}
}
}
}
}
I get the following errors:
(source: discourse.org)
But when I change the "precision" field to an int, I get the intended search results.
I'm confused on two fronts.
Why is there a context error? The documentation seems to say that this is ok
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/suggester-context.html
Why can't I use string values for the precision values?
At the bottom of the page, I see that the precision values can take either distances or numeric values.

elasticsearch: How to rank first appearing words or phrases higher

For example, if I have the following documents:
1. Casa Road
2. Jalan Casa
Say my query term is "cas"... on searching, both documents have same scores. I want the one with casa appearing earlier (i.e. document 1 here) and to rank first in my query output.
I am using an edgeNGram Analyzer. Also I am using aggregations so I cannot use the normal sorting that happens after querying.
You can use the Bool Query to boost the items that start with the search query:
{
"bool" : {
"must" : {
"match" : { "name" : "cas" }
},
"should": {
"prefix" : { "name" : "cas" }
},
}
}
I'm assuming the values you gave is in the name field, and that that field is not analyzed. If it is analyzed, maybe look at this answer for more ideas.
The way it works is:
Both documents will match the query in the must clause, and will receive the same score for that. A document won't be included if it doesn't match the must query.
Only the document with the term starting with cas will match the query in the should clause, causing it to receive a higher score. A document won't be excluded if it doesn't match the should query.
This might be a bit more involved, but it should work.
Basically, you need the position of the term within the text itself and, also, the number of terms from the text. The actual scoring is computed using scripts, so you need to enable dynamic scripting in elasticsearch.yml config file:
script.engine.groovy.inline.search: on
This is what you need:
a mapping that is using term_vector set to with_positions, and edgeNGram and a sub-field of type token_count:
PUT /test
{
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions",
"index_analyzer": "edgengram_analyzer",
"search_analyzer": "keyword",
"fields": {
"word_count": {
"type": "token_count",
"store": "yes",
"analyzer": "standard"
}
}
}
}
}
},
"settings": {
"analysis": {
"filter": {
"name_ngrams": {
"min_gram": "2",
"type": "edgeNGram",
"max_gram": "30"
}
},
"analyzer": {
"edgengram_analyzer": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"name_ngrams"
],
"tokenizer": "standard"
}
}
}
}
}
test documents:
POST /test/test/1
{"text":"Casa Road"}
POST /test/test/2
{"text":"Jalan Casa"}
the query itself:
GET /test/test/_search
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"term": {
"text": {
"value": "cas"
}
}
},
"script_score": {
"script": "termInfo=_index['text'].get('cas',_POSITIONS);wordCount=doc['text.word_count'].value;if (termInfo) {for(pos in termInfo){return (wordCount-pos.position)/wordCount}};"
},
"boost_mode": "sum"
}
}
]
}
}
}
and the results:
"hits": {
"total": 2,
"max_score": 1.3715843,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 1.3715843,
"_source": {
"text": "Casa Road"
}
},
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 0.8715843,
"_source": {
"text": "Jalan Casa"
}
}
]
}

Resources