get shingle result from elasticsearch - elasticsearch

I'm already familiar with shingle analyzer and I am able to create a shingle analyzer as follows:
"index": {
"number_of_shards": 10,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"shingle_analyzer": {
"filter": [
"standard",
"lowercase"
"filter_shingle"
]
}
},
"filter": {
"filter_shingle": {
"type": "shingle",
"max_shingle_size": 2,
"min_shingle_size": 2,
"output_unigrams": false
}
}
}
}
and then I use the defined analyzer in mapping for a field in my document named content.The problem is the content field is a very long text and I want to use it as data for a autocomplete suggester, so I just need one or two words that follow the matched phrase. I wonder if there is a way to get the search (or suggest or analyze) API result as shingles too. By using shingle analyzer the elastic itself indexes the text as shingles, is there a way to access those shingles?
For instance,
the query I pass is :
GET the_index/_search
{
"_source": ["content"],
"explain": true,
"query" : {
"match" : { "content.shngled_field": "news" }
}
}
the result is :
{
"took" : 395,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 7.8647532,
"hits" : [
{
"_shard" : "[v3_kavan_telegram_201911][0]",
"_node" : "L6vHYla-TN6CHo2I6g4M_A",
"_index" : "v3_kavan_telegram_201911",
"_type" : "_doc",
"_id" : "g1music/70733",
"_score" : 7.8647532,
"_source" : {
"content" : "Find the latest breaking news and information on the top stories, weather, business, entertainment, politics, and more."
....
}
as you can see the result contains the whole content filed which is a very long text. The result I expect is
"content" : "news and information on"
which is the matched shingle itself.

After you've created an index & ingested a doc
PUT sh
{
"mappings": {
"properties": {
"content": {
"type": "text",
"fields": {
"shingled": {
"type": "text",
"analyzer": "shingle_analyzer"
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"shingle_analyzer": {
"type": "standard",
"filter": [
"standard",
"lowercase",
"filter_shingle"
]
}
},
"filter": {
"filter_shingle": {
"type": "shingle",
"max_shingle_size": 2,
"min_shingle_size": 2,
"output_unigrams": false
}
}
}
}
}
POST sh/_doc/1
{
"content": "and then I use the defined analyzer in mapping for a field in my document named content.The problem is the content field is a very long text and I want to use it as data for a autocomplete suggester, so I just need one or two words that follow the matched phrase. I wonder if there is a way to get the search (or suggest or analyze) API result as shingles too. By using shingle analyzer the elastic itself indexes the text as shingles, is there a way to access those shingles?"
}
You can call either _analyze w/ the corresponding analyzer to see how a given text would be tokenized:
GET sh/_analyze
{
"text": "and then I use the defined analyzer in mapping for a field in my document named content.The problem is the content field is a very long text and I want to use it as data for a autocomplete suggester, so I just need one or two words that follow the matched phrase. I wonder if there is a way to get the search (or suggest or analyze) API result as shingles too. By using shingle analyzer the elastic itself indexes the text as shingles, is there a way to access those shingles?",
"analyzer": "shingle_analyzer"
}
Or check out the term vectors information:
GET sh/_termvectors/1
{
"fields" : ["content.shingled"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
Will you be highlighting too?

Related

Elasticsearch mapping for the UK postcodes, able to deal with spacing and capatalization

I am looking for a mapping/analyzer setup for Elasticsearch 7 with the UK postcodes. We do not require any fuzzy operator, but should be able to deal with variance in capital letters and spacing.
Some examples:
Query string: "SN13 9ED" should return:
sn139ed
SN13 9ED
Sn13 9ed
but should not return:
SN13 1EP
SN131EP
The keyword analyzer is used by default and this seems to be sensitive to spacing issues, but not to capital letters. It also will return a match for SN13 1EP unless we specify a query as SN13 AND 9ED, which we do not want.
Additionally, with the keyword analyzer, a query of SN13 9ED returns a result of SN13 1EP with a higher relevance than SN13 9ED even though this should be the exact match. Why are 2 matches in the same string a lower relevance than just 1 match?
Mapping for postal code
"post_code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
Query
"query" => array:1 [▼
"query_string" => array:1 [▼
"query" => "KT2 7AJ"
]
]
I believe based on my comments, you may have been able to filter out SN13 1EP when your search string would be SN13 9ED.
Hope you are aware of what Analysis is, how Analyzers work on text field and how by default Standard Analyzer is applied on tokens before they eventually are stored in inverted index. Note that this is only applied on text fields.
Looking at your mapping, if you would have used searching on post_code and not post_code.keyword, I believe capitalization would have been resolved because ES for text field by default uses Standard Analyzer which means your tokens would eventually gets saved in index in lowercase format and even while querying, ES during querying time, the analyzer would be applied before it searches in the inverted index.
Note that by default, the same analyzer as configured in the mapping are applied during index time as well as search time on that field
For the scenarios where you have sn131ep what I've done is made use of Pattern Capture Token Filter where I've specified a regex which would break the token into two of lengths 4 and 3 each and thereby save them in inverted index which in this case would be sn13 and 1ep. I'm also lowercasing them before I store them in inverted index.
Note that the scenario I'm adding for your postcode is that its size is fixed i.e. having 7 characters. You can add more patterns if that is not the case
Please see below for more details:
Mapping:
PUT my_postcode_index
{
"settings" : {
"analysis" : {
"filter" : {
"mypattern" : {
"type" : "pattern_capture",
"preserve_original" : true,
"patterns" : [
"(\\w{4}+)|(\\w{3}+)", <--- Note this and feel free to add more patterns
"\\s" <--- Filter based on whitespace
]
}
},
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "pattern",
"filter" : [ "mypattern", "lowercase" ] <--- Note the lowercase here
}
}
}
},
"mappings": {
"properties": {
"postcode":{
"type": "text",
"analyzer": "my_analyzer", <--- Note this
"fields":{
"keyword":{
"type": "keyword"
}
}
}
}
}
}
Sample Documents:
POST my_postcode_index/_doc/1
{
"postcode": "SN131EP"
}
POST my_postcode_index/_doc/2
{
"postcode": "sn13 1EP"
}
POST my_postcode_index/_doc/3
{
"postcode": "sn131ep"
}
Note that these documents are semantically the same.
Request Query:
POST my_postcode_index/_search
{
"query": {
"query_string": {
"default_field": "postcode",
"query": "SN13 1EP",
"default_operator": "AND"
}
}
}
Response:
{
"took" : 24,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.6246513,
"hits" : [
{
"_index" : "my_postcode_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6246513,
"_source" : {
"postcode" : "SN131EP"
}
},
{
"_index" : "my_postcode_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.6246513,
"_source" : {
"postcode" : "sn131ep"
}
},
{
"_index" : "my_postcode_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.5200585,
"_source" : {
"postcode" : "sn13 1EP"
}
}
]
}
}
Notice that all three documents are returned even with queries snp131p and snp13 1ep.
Additional Note:
You can make use of Analyze API to figure out what tokens are created for a particular text
POST my_postcode_index/_analyze
{
"analyzer": "my_analyzer",
"text": "sn139ed"
}
And you can see below what tokens are stored in inverted index.
{
"tokens" : [
{
"token" : "sn139ed",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
},
{
"token" : "sn13",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
},
{
"token" : "9ed",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
}
]
}
Also:
You may also want to read about Ngram Tokenizer. I'd advise you to play around both the solutions and see what best suits your inputs.
Please test it and let me know if you have any queries.
In addition to Opsters answer, the following can also be used to tackle the issue from the opposite angle. For Opster's answer, they suggest splitting value by a known postcode pattern, which is great.
If we do not know the pattern, the following can be used:
{
"analysis": {
"filter": {
"whitespace_remove": {
"pattern": " ",
"type": "pattern_replace",
"replacement": ""
}
},
"analyzer": {
"no_space_analyzer": {
"filter": [
"lowercase",
"whitespace_remove"
],
"tokenizer": "keyword"
}
}
}
}
{
"post_code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "no_space_analyzer"
}
}
This allows us to search with any kind of spacing, and with any case due to the lowercase filter.
sn13 1ep, s n 1 3 1 e p, sn131ep will all match against SN13 1EP
I think the main drawback to this option, however, is we will no longer get any results for sn13 as we are not producing at tokens. sn13* would bring us back results, however.
Is it possible to mix both of these methods together so we can have the best of both worlds?

Building an effective Elasticsearch query for cross_fields with fuzziness

I know that Elasticsearch does not support fuzziness with the cross_fields type in a multi_match query. I have a very difficult time with the Elasticsearch API and so I'm finding it challenging to build an analogous query that searches across multiple document fields with fuzzy string matching.
I have an index called papers with various fields such as Title, Author.FirstName, Author.LastName, PublicationDate, Journal etc... I want to be able to query with a string like "John Doe paper title 2015 journal name". cross_fields is the perfect multi_match type but it doesn't support fuzziness which is critical for my application.
Can anyone suggest a reasonable way to approach this? I've spent hours going through solutions on SO and the Elasticsearch forums with little success.
You can make use of copy_to field for this scenario. Basically you are copying all the values from different fields into one new field (my_search_field in the below details) and on this field, you would be able to perform fuzzy query via fuzziness parameter using simple match query.
Below is how a sample mapping, document and query would be:
Mapping:
PUT my_fuzzy_index
{
"mappings": {
"properties": {
"my_search_field":{ <---- Note this field
"type": "text"
},
"Title":{
"type": "text",
"copy_to": "my_search_field" <---- Note this
},
"Author":{
"type": "nested",
"properties": {
"FirstName":{
"type":"text",
"copy_to": "my_search_field" <---- Note this
},
"LastName":{
"type":"text",
"copy_to": "my_search_field" <---- Note this
}
}
},
"PublicationDate":{
"type": "date",
"copy_to": "my_search_field" <---- Note this
},
"Journal":{
"type":"text",
"copy_to": "my_search_field" <---- Note this
}
}
}
}
Sample Document:
POST my_fuzzy_index/_doc/1
{
"Title": "Fountainhead",
"Author":[
{
"FirstName": "Ayn",
"LastName": "Rand"
}
],
"PublicationDate": "2015",
"Journal": "journal"
}
Query Request:
POST my_fuzzy_index/_search
{
"query": {
"match": {
"my_search_field": { <---- Note this field
"query": "Aynnn Ranaad Fountainhead 2015 journal",
"fuzziness": 3 <---- Fuzzy parameter
}
}
}
}
Response:
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.1027813,
"hits" : [
{
"_index" : "my_fuzzy_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.1027813,
"_source" : {
"Title" : "Fountainhead",
"Author" : [
{
"FirstName" : "Ayn",
"LastName" : "Rand"
}
],
"PublicationDate" : "2015",
"Journal" : "journal"
}
}
]
}
}
So instead of thinking of applying fuzzy query on multiple fields, you can instead go for this approach. That way your query would be simplified.
Let me know if this helps!

Query to partially match every word in a search term in Elasticsearch

I have an array of tags containing words.
tags: ['australianbrownsnake', 'venomoussnake', ...]
How do I match this against these search terms:
'brown snake', 'australian snake', 'venomous', 'venomous brown snake'
I am not even sure if this is possible since I am new to Elasticsearch.
Help would be appreciated. Thank you.
Edit: I have created an ngram analyzer and added a field called ngram like so.
properties": {
"tags": {
"type": "text",
"fields": {
"ngram": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
i tried the following query but no luck
"query": {
"multi_match": {
"query": "snake",
"fields": [
"tags.ngram"
],
"type": "most_fields"
}
}
my tag mapping is as follows:
"tags" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"ngram" : {
"type" : "text",
"analyzer" : "my_analyzer"
}
}
},
my settings are:
{
"image" : {
"settings" : {
"index" : {
"max_ngram_diff" : "10",
"number_of_shards" : "1",
"provided_name" : "image",
"creation_date" : "1572590562106",
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "my_tokenizer"
}
},
"tokenizer" : {
"my_tokenizer" : {
"token_chars" : [
"letter",
"digit"
],
"min_gram" : "3",
"type" : "ngram",
"max_gram" : "10"
}
}
},
"number_of_replicas" : "1",
"uuid" : "pO9F7W43QxuZmI9vmXfKyw",
"version" : {
"created" : "7040299"
}
}
}
}
}
Update:
This config should work fine.
I believe it was my mistake. I was searching on the wrong index
You need to index your tags in the way you want to search them. For queries like 'brown snake', 'australian snake' to match your tags you would need to break them into smaller tokens.
By default elasticsearch indexes strings by passing it through its standard analyzer. You can always create your custom analyzer to store your field however you want. You can create your custom analyzer which tokenizes strings into nGrams. You can specify a size of 3-10 which will store your 'australianbrownsnake' tag as something like: ['aus', 'aust', ..., 'tra', 'tral',...]
You can then modify your search query to match on your tags.ngram field and you should get the desired results.
tags.ngrams field can be created like so:
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html
using ngram tokenizer:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html
EDIT1: Elastic tends to use the analyzer of the field being matched on, to analyze the query keywords. You might not need the user query to be tokenized in nGrams since there should be a matching nGram stored in the tags field. You could specify a standard search_analyzer in your mappings.

ElaaticSearch - extract info between tags in the Highlights field

We have a field in our ElasticSearch index called Terms Matched and we populate that field at query time with the values that are tagged in the Highlights field of a given result. The Highlights field is derived from our field called Free Text, which contains unstructured data. The query is not a match phrase query - it looks for the words in the query to be within a certain distance of each other via a span-multi query.
So right now, an example could look like this:
Query: John Smith
Result:
Free Text: "Once upon a time, John Alexander Smith went to the market..."
Highlights: "Once upon a time, <em>John</em> Alexander <em>Smith</em> went to the market..."
Terms Matched: John Smith
Currently, the Terms Matched field is just a concatenation of the tags from Highlights. What we want to do is have the Terms Matched field return the tags, AND anything between the tags, if there is more than one tag - so in the above example the Terms Matched field would show "John Alexander Smith."
How could we accomplish this in ElasticSearch?
So I think this is working as you would expect.
This is mapping with shingles token filter configured. Shingles will produce combinations of searchable tokens (2 to 4 tokens per shingle).
PUT /highlights
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"my_shingle"
]
}
},
"filter": {
"my_shingle": {
"type": "shingle",
"max_shingle_size": 4,
"min_shingle_size": 2
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"search_analyzer": "standard",
"analyzer": "my_custom_analyzer"
}
}
}
}
Dummy document
PUT /highlights/_doc/1
{
"content": "Once upon a time, John Alexander Smith went to the market..."
}
And basic search query
GET /highlights/_search
{
"query": {
"match": {
"content": "John Smith"
}
},
"highlight": {
"fields": {
"content": {
"type": "plain"
}
}
}
}
This is the response, with correctly (hopefully) highlighted text:
{
"took" : 46,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.8111373,
"hits" : [
{
"_index" : "highlights",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.8111373,
"_source" : {
"content" : "Once upon a time, John Alexander Smith went to the market..."
},
"highlight" : {
"content" : [
"Once upon a time, <em>John Alexander Smith</em> went to the market..."
]
}
}
]
}
}
Yet again, you might need to tweak this quite a lot, but this should put you on right track.

Elasticsearch exact matches on analyzed fields

Is there a way to have ElasticSearch identify exact matches on analyzed fields? Ideally, I would like to lowercase, tokenize, stem and perhaps even phoneticize my docs, then have queries pull "exact" matches out.
What I mean is that if I index "Hamburger Buns" and "Hamburgers", they will be analyzed as ["hamburger","bun"] and ["hamburger"]. If I search for "Hamburger", it will only return the "hamburger" doc, as that's the "exact" match.
I've tried using the keyword tokenizer, but that won't stem the individual tokens. Do I need to do something to ensure that the number of tokens is equal or so?
I'm familiar with multi-fields and using the "not_analyzed" type, but this is more restrictive than I'm looking for. I'd like exact matching, post-analysis.
Use shingles tokenizer together with stemming and whatever else you need. Add a sub-field of type token_count that will count the number of tokens in the field.
At searching time, you need to add an additional filter to match the number of tokens in the index with the number of tokens you have in the searching text. You would need an additional step, when you perform the actual search, that should count the tokens in the searching string. This is like this because shingles will create multiple permutations of tokens and you need to make sure that it matches the size of your searching text.
An attempt for this, just to give you an idea:
{
"settings": {
"analysis": {
"filter": {
"filter_shingle": {
"type": "shingle",
"max_shingle_size": 10,
"min_shingle_size": 2,
"output_unigrams": true
},
"filter_stemmer": {
"type": "porter_stem",
"language": "_english_"
}
},
"analyzer": {
"ShingleAnalyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"snowball",
"filter_stemmer",
"filter_shingle"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"analyzer": "ShingleAnalyzer",
"fields": {
"word_count": {
"type": "token_count",
"store": "yes",
"analyzer": "ShingleAnalyzer"
}
}
}
}
}
}
}
And the query:
{
"query": {
"filtered": {
"query": {
"match_phrase": {
"text": {
"query": "HaMbUrGeRs BUN"
}
}
},
"filter": {
"term": {
"text.word_count": "2"
}
}
}
}
}
The shingles filter is important here because it can create combinations of tokens. And more than that, these are combinations that keep the order or the tokens. Imo, the most difficult requirement to fulfill here is to change the tokens (stemming, lowercasing etc) and, also, to assemble back the original text. Unless you define your own "concatenation" filter I don't think there is any other way than using the shingles filter.
But with shingles there is another issue: it creates combinations that are not needed. For a text like "Hamburgers buns in Los Angeles" you end up with a long list of shingles:
"angeles",
"buns",
"buns in",
"buns in los",
"buns in los angeles",
"hamburgers",
"hamburgers buns",
"hamburgers buns in",
"hamburgers buns in los",
"hamburgers buns in los angeles",
"in",
"in los",
"in los angeles",
"los",
"los angeles"
If you are interested in only those documents that match exactly meaning, the documents above matches only when you search for "hamburgers buns in los angeles" (and doesn't match something like "any hamburgers buns in los angeles") then you need a way to filter that long list of shingles. The way I see it is to use word_count.
You can use multi-fields for that purpose and have a not_analyzed sub-field within your analyzed field (let's call it item in this example). Your mapping would have to look like this:
{
"yourtype": {
"properties": {
"item": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
With this kind of mapping, you can check how each of the values Hamburgers and Hamburger Buns are "viewed" by the analyzer with respect to your multi-field item and item.raw
For Hamburger:
curl -XGET 'localhost:9200/yourtypes/_analyze?field=item&pretty' -d 'Hamburger'
{
"tokens" : [ {
"token" : "hamburger",
"start_offset" : 0,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
} ]
}
curl -XGET 'localhost:9200/yourtypes/_analyze?field=item.raw&pretty' -d 'Hamburger'
{
"tokens" : [ {
"token" : "Hamburger",
"start_offset" : 0,
"end_offset" : 10,
"type" : "word",
"position" : 1
} ]
}
For Hamburger Buns:
curl -XGET 'localhost:9200/yourtypes/_analyze?field=item&pretty' -d 'Hamburger Buns'
{
"tokens" : [ {
"token" : "hamburger",
"start_offset" : 0,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "buns",
"start_offset" : 11,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 2
} ]
}
curl -XGET 'localhost:9200/yourtypes/_analyze?field=item.raw&pretty' -d 'Hamburger Buns'
{
"tokens" : [ {
"token" : "Hamburger Buns",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 1
} ]
}
As you can see, the not_analyzed field is going to be indexed untouched exactly as it was input.
Now, let's index two sample documents to illustrate this:
curl -XPOST localhost:9200/yourtypes/_bulk -d '
{"index": {"_type": "yourtype", "_id": 1}}
{"item": "Hamburger"}
{"index": {"_type": "yourtype", "_id": 2}}
{"item": "Hamburger Buns"}
'
And finally, to answer your question, if you want to have an exact match on Hamburger, you can search within your sub-field item.raw like this (note that the case has to match, too):
curl -XPOST localhost:9200/yourtypes/yourtype/_search -d '{
"query": {
"term": {
"item.raw": "Hamburger"
}
}
}'
And you'll get:
{
...
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "yourtypes",
"_type" : "yourtype",
"_id" : "1",
"_score" : 0.30685282,
"_source":{"item": "Hamburger"}
} ]
}
}
UPDATE (see comments/discussion below and question re-edit)
Taking your example from the comments and trying to have HaMbUrGeR BuNs match Hamburger buns you could simply achieve it with a match query like this.
curl -XPOST localhost:9200/yourtypes/yourtype/_search?pretty -d '{
"query": {
"match": {
"item": {
"query": "HaMbUrGeR BuNs",
"operator": "and"
}
}
}
}'
Which based on the same two indexed documents above will yield
{
...
"hits" : {
"total" : 1,
"max_score" : 0.2712221,
"hits" : [ {
"_index" : "yourtypes",
"_type" : "yourtype",
"_id" : "2",
"_score" : 0.2712221,
"_source":{"item": "Hamburger Buns"}
} ]
}
}
You can keep the analyzer as what you expected (lowercase, tokenize, stem, ...), and use query_string as the main query, match_phrase as the boosting query to search. Something like this:
{
"bool" : {
"should" : [
{
"query_string" : {
"default_field" : "your_field",
"default_operator" : "OR",
"phrase_slop" : 1,
"query" : "Hamburger"
}
},
{
"match_phrase": {
"your_field": {
"query": "Hamburger"
}
}
}
]
}
}
It will match both documents, and exact match (match_phrase) will be on top since the query match both should clauses (and get higher score)
default_operator is set to OR, it will help the query "Hamburger Buns" (match hamburger OR bun) match the document "Hamburger" also.
phrase_slop is set to 1 to match terms with distance = 1 only, e.g. search for Hamburger Buns will not match document Hamburger Big Buns. You can adjust this depend on your requirements.
You can refer Closer is better, Query string for more details.

Resources