add fuzziness to elasticsearch query - elasticsearch

I have a query for an autocomplete/suggestions index that looks like this:
{
"size": 10,
"query": {
"multi_match": {
"query": "'"+search_text+"'",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
This query works exactly as I want it to. However I want to add fuzziness:"AUTO" to this query. I read the documentation and tried adding it like this:
{
"size": 10,
"query": {
"multi_match": {
"query": {
"fuzzy": {
"value": "'"+search_text+"'",
"fuzziness": "AUTO"
}
},
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
But I get a this error
```
"type": "parsing_exception",
"reason": "[multi_match] unknown token [START_OBJECT] after [query]",```
This is causing my query not to work.

There is no need to add a fuzzy query. To add fuzziness to a multi-match query you need to add the fuzziness property as described here :
Since you are using bool_prefix as the type of multi-match query, so it creates a match_bool_prefix on each field that analyzes its input and constructs a bool query from the terms. Each term except the last is used in a term query. The last term is used in a prefix query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"serviceTitle": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"services": {
"type": "search_as_you_type",
"max_shingle_size": 3
}
}
}
}
Index Data:
{
"company_name":"sequencing how shingles are actually used"
}
Search Query:
{
"size": 10,
"query": {
"multi_match": {
"query": "sequensing how shingles",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
],
"fuzziness":"auto"
}
}
}
Search Result:
"hits": [
{
"_index": "65153201",
"_type": "_doc",
"_id": "1",
"_score": 1.5465959,
"_source": {
"company_name": "sequencing how shingles are actually used"
}
}
]
If you want to query sequensing, and get the above document, then you need to change the type of multi-match from bool_prefix to another type according to your use case.

Related

Elastic search how to query the results for the keyword exists in the given fields

I have a email elastic search db created uses following mappings for email sender and receipients:
"mappings": {
...
"recipients": {
"type": "keyword"
},
"sender": {
"type": "keyword"
},
...
I am given a list of emails and I try to query the emails if the any of the email is either the sender OR recipient. For example, I try to use following query:
{
"query": {
"multi_match" : {
"query": "abc#apple.com defg#samsung.com",
"operator": "OR",
"fields": [ "recipients", "sender" ],
"type": "cross_fields"
}
}
}
to query the emails if (abc#apple.com exists in the sender or receipient) OR (defg#samsung.com exists in the sender or receipient). But it doesn't return any result.. (But it do exists)
Does anyone know how to query the emails if any of the email in sender or receipient?
Thanks
It's good that you have found the solution, but understanding why multi_match didn't work and why query_string worked, and why you should avoid the query_string if possible important.
As mentioned, in the official Elasticsearch documentation,
Also, your multi_match query didn't work as you provided the two mails input in the same query like abc#apple.com defg#samsung.com and this term is analyzed depending on the fields analyzer(keyword in your example), So, it would try to find abc#apple.com defg#samsung.com in your fields, not abc#apple.com or defg#samsung.com.
If you want to use the multi_match, right query would be
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "abc#apple.com",
"operator": "OR",
"fields": [
"recipients",
"sender"
],
"type": "cross_fields"
}
},
{
"multi_match": {
"query": "defg#samsung.com",
"operator": "OR",
"fields": [
"recipients",
"sender"
],
"type": "cross_fields"
}
}
]
}
}
}
which returns below documents.
"hits": [
{
"_index": "71367024",
"_id": "1",
"_score": 0.6931471,
"_source": {
"recipients": "abc#apple.com",
"sender": "foo#bar.com"
}
},
{
"_index": "71367024",
"_id": "2",
"_score": 0.6931471,
"_source": {
"recipients": "defg#samsung.com",
"sender": "baz#bar.com"
}
}
]
I think I may find the answer. Using the following query will work:
{
"query": {
"query_string" : {
"query": "abc#apple.com OR defg#samsung.com",
"fields": [ "recipients", "sender" ]
}
}

How to find word 'food2u' by search 'food' in Elasticsearch?

I am a rookie who just started learning elasticsearch,And I want to find word like 'food2u' by search keyword 'food'.But I can only get the results like 'Food Repo','Give Food' etc. The field's Mapping is 'text' and this is my query
GET api/_search
{"query": {
"match": {
"Name": {
"query": "food"
}
}
},
"_source":{
"includes":["Name"]
}
}
You are getting the results like 'Food Repo','Give Food', as the text field uses a standard analyzer if no analyzer is specified. Food Repo gets tokenized into food and repo. Similarly Give Food gets tokenized into give and food.
But food2u gets tokenized into food2u. Since there is no matching token ("food"), you will not get the food2u document.
You need to use edge_ngram tokenizer to do a partial text match.
Adding a working example with index data, mapping, search query and search result
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 4,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
},
"max_ngram_diff": 10
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Index Data:
{
"name":"food2u"
}
Search Query:
{
"query": {
"match": {
"name": "food"
}
}
}
Search Result:
"hits": [
{
"_index": "67552800",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"name": "food2u"
}
}
]
If you don't want to change the mapping, you can even use a wildcard query to return the matching documents
{
"query": {
"wildcard": {
"Name": {
"value": "food*"
}
}
}
}
OR you can even use query_string with wildcard
{
"query": {
"query_string": {
"query": "food*",
"fields": [
"Name"
]
}
}
}

Elasticsearch query match + term boolean

I have documents in elasticsearch index with a "type" field, like this:
[
{
"id": 1,
"serviceDescription": "a bunch of text",
"serviceTitle": "title",
"serviceTags":["tag1","tag2"]
"type":"service"
},
{
"id": 2,
"companyDescription": "a bunch of text more",
"companyTitle": "title",
"companyTags":["tag1","tag2"]
"type":"company"
},...
]
I want to run a match query across all docs in my index, like this:
body = {
"query": {
"match": {
"_all":"sequencing"
}
}
}
but add a filter to only return results where the "type" field equals "service".
As far as I can understand your question, you want to query for sequencing query string, across all the fields, for that
you can use the multi_match query that builds on the match query to allow multi-field queries.
If no fields are provided, the multi_match query defaults to the
index.query.default_field index settings, which in turn defaults to *.
This extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then
combined to build a query.
Search Query:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "bunch of text"
}
}
],
"filter": {
"term": {
"type": "service"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "64867032",
"_type": "_doc",
"_id": "1",
"_score": 0.8630463,
"_source": {
"id": 1,
"serviceDescription": "a bunch of text",
"serviceTitle": "title",
"serviceTags": [
"tag1",
"tag2"
],
"type": "service"
}
}
]

Returning documents that match multiple wildcard string queries

I'm new to Elasticsearch and would greatly appreciate help on this
In the query below I only want the first document to be returned, but instead both documents are returned. How can I write a query to search for two wildcard strings on two separate fields, but only return documents that match?
I think what's being returned currently is score dependent, but I don't need the score.
POST /pr/_doc/1
{
"type": "Type ONE",
"currency":"USD"
}
POST /pr/_doc/2
{
"type": "Type TWO",
"currency":"USD"
}
GET /pr/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Type ON*",
"fields": ["type"],
"analyze_wildcard": true
}
},
{
"simple_query_string": {
"query": "US*",
"fields": ["currency"],
"analyze_wildcard":true
}
}
]
}
}
}
Use below query which uses the default_operator: AND and query string for in depth information and further reading.
Search query
{
"query": {
"query_string": {
"query": "(Type ON*) AND (US*)",
"fields" : ["type", "currency"],
"default_operator" : "AND"
}
}
}
Index your sample docs and it returns your expected doc only:
"hits": [
{
"_index": "multiplequery",
"_type": "_doc",
"_id": "1",
"_score": 2.1823215,
"_source": {
"type": "Type ONE",
"currency": "USD"
}
}
]

elasticsearch: How to rank first appearing words or phrases higher

For example, if I have the following documents:
1. Casa Road
2. Jalan Casa
Say my query term is "cas"... on searching, both documents have same scores. I want the one with casa appearing earlier (i.e. document 1 here) and to rank first in my query output.
I am using an edgeNGram Analyzer. Also I am using aggregations so I cannot use the normal sorting that happens after querying.
You can use the Bool Query to boost the items that start with the search query:
{
"bool" : {
"must" : {
"match" : { "name" : "cas" }
},
"should": {
"prefix" : { "name" : "cas" }
},
}
}
I'm assuming the values you gave is in the name field, and that that field is not analyzed. If it is analyzed, maybe look at this answer for more ideas.
The way it works is:
Both documents will match the query in the must clause, and will receive the same score for that. A document won't be included if it doesn't match the must query.
Only the document with the term starting with cas will match the query in the should clause, causing it to receive a higher score. A document won't be excluded if it doesn't match the should query.
This might be a bit more involved, but it should work.
Basically, you need the position of the term within the text itself and, also, the number of terms from the text. The actual scoring is computed using scripts, so you need to enable dynamic scripting in elasticsearch.yml config file:
script.engine.groovy.inline.search: on
This is what you need:
a mapping that is using term_vector set to with_positions, and edgeNGram and a sub-field of type token_count:
PUT /test
{
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions",
"index_analyzer": "edgengram_analyzer",
"search_analyzer": "keyword",
"fields": {
"word_count": {
"type": "token_count",
"store": "yes",
"analyzer": "standard"
}
}
}
}
}
},
"settings": {
"analysis": {
"filter": {
"name_ngrams": {
"min_gram": "2",
"type": "edgeNGram",
"max_gram": "30"
}
},
"analyzer": {
"edgengram_analyzer": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"name_ngrams"
],
"tokenizer": "standard"
}
}
}
}
}
test documents:
POST /test/test/1
{"text":"Casa Road"}
POST /test/test/2
{"text":"Jalan Casa"}
the query itself:
GET /test/test/_search
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"term": {
"text": {
"value": "cas"
}
}
},
"script_score": {
"script": "termInfo=_index['text'].get('cas',_POSITIONS);wordCount=doc['text.word_count'].value;if (termInfo) {for(pos in termInfo){return (wordCount-pos.position)/wordCount}};"
},
"boost_mode": "sum"
}
}
]
}
}
}
and the results:
"hits": {
"total": 2,
"max_score": 1.3715843,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 1.3715843,
"_source": {
"text": "Casa Road"
}
},
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 0.8715843,
"_source": {
"text": "Jalan Casa"
}
}
]
}

Resources