Does the structure of a match query affect the server - elasticsearch

I'm writing some code to generate queries and I wondered if there was any one way of generating the queries that was kinder to the server.
So this query:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"Text": {
"query": "Scooby Shaggy corridor",
"fuzziness": 1,
"operator": "AND"
}
}
}
]
}
}
}
is logically equivalent to this:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"Text": {
"query": "Scooby",
"fuzziness": 1
}
}
},
{
"match": {
"Text": {
"query": "Shaggy",
"fuzziness": 1
}
}
},
{
"match": {
"Text": {
"query": "corridor",
"fuzziness": 1
}
}
}
]
}
}
}
but is either one easier for the server to process?
Or does it make no difference?
I realise this is a trivial example but could it make a difference with more complex queries?
If someone who knows a bit about how ElasticSearch behaves under the hood could make an observation I'd be grateful.
Thanks,
Adam.

Elasticsearch will rewrite itself your multi-term match query to the logical equivalent. see here for more details.
The match query is of type boolean. It means that the text provided is
analyzed and the analysis process constructs a boolean query from the
provided text.
But you should keep the multi-term match query and let elasticsearch do the job. Its more maintainable and you can control the rewriting thanks to the rewrite parameter ( see here )

Related

Fuzzy match words in any order in Elasticsearch

What I need to achieve is to match documents based on single field (product name, which consists of basically all possible filter values). I know it is not the most reliable solution, but I only have this one field to work with.
I need to be able to send a search query and the words in that query to be matched in any order to the name field (name should contain all words from the search query). Actually at this point simple match_phrase_prefix works pretty well, but what is missing there is fuzziness. Because another thing we need is to allow user make some typos and still get relevant results.
My question is, is there any way to have match_phrase_prefix-like query, but with fuzziness?
I tried some nested bool queries with match, but I don't get anything near match_phrase_prefix this way.
Examples of what I tried:
Pretty good results, but no fuzziness:
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"name.standard": {
"query": "brand thing model",
"slop": 10
}
}
}
]
}
}
}
Fuzziness, but very limited matches:
{
"query": {
"bool": {
"must": [
{
"match": {
"name.standard": {
"query": "thing",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
},
{
"match": {
"name.standard": {
"query": "brand",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
}
]
}
}
}
Using should above, I get more results, but they are way less relevant than the ones from first query.
Above can be achieved by simple match query
{
"query": {
"match": {
"name.standard": {
"query": "brand thing model",
"operator": "and" ,//It means all of above 3 tokens must be present in any order
"fuzziness": "AUTO" // value as per your choice
}
}
}
}

Elasticsearch: alternative to cross_fields with fuzziness

I have an elasticsearch index with the standard analyzer. I would like to perfom search queries containing multiple words, e.g. human anatomy. This search should be performed across several fields:
Title
Subject
Description
All the words in the query should be present in any of the fields (e.g. 'human' in title and 'anatomy' in description, etc.). If not all the words are present across these fields, the result shouldn't be returned.
Now, more importantly, I want to get fuzzy matches (for example, these queries should return approximately the same results as human anatomy:
human anatom
human anatomic
humanic anatomic
etc.
So fuzziness should apply to every word in the query.
As Elasticsearch doesn't support fuzziness for the multi-match cross-fields queries, I have been trying to achieve the desired behaviour this way:
{
"query": {
"bool" : {
"must": [
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "human",
"fuzziness": 2,
}
}
},
]
}
}
},
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
]
}
}
},
]
}
}
}
The idea behind this code is the following: find the results where
either of the fields contains human (with 2-letter edit distance, e.g.: humane, humon, humanic, etc.)
and
either of the fields contains anatomy (with 2-letter edit distance, e.g.: anatom, anatomic, etc.).
Unfortunately, this code does not work and fails to retrieve a great number of relevant results. For example (the edit distance between each of the words in the two queries <= 2):
human anatomic – 0 results
humans anatomy – 21 results
How can I make fuzziness work within the given conditions? Recreating the index with n-gram is currently not an option, so I would like to make fuzziness work.

How can I improve elasticsearch site search experience?

I have at least a thousand documents and more will be coming containing title/page content. Think Google like site search. I'm experiencing a lot of noise while enabling fuzziness. At the same time, fuziness will be helpful in addressing user error etc.
I already have the indexing down, consuming all changes to pages real time.
I'm doing a bunch of conditional steps for matching and boosting them based on significance.
Initially we did a poc to convert the page contents into fixed length vectors and use BERT to query which did not have much of an improvement, so right now it's using pure es queries.
I know there is a lot that goes into searchability in elasticsearch, Do you happen to have any resources to look into improving site search experience with elasticsearch? I'm missing major basics/foundations of searchability which I would like to improve on.
I have gotten a recommendation from another post for https://www.manning.com/books/relevant-search which I plan to learn from when it is delivered.
Some things that I'm thinking and proposed by my team lead is to perform dyanamic queries based on the query the user makes. eg. If user searches a name(service to check if it is a name) use a query without fuzziness.
{
"query": {
"bool": {
"must_not":{
"match": {
"channel": "techhub"
}
},
"should": [
{
"match_phrase": {
"title": {
"query": message,
"slop": 1,
"boost": 10.0
}
}
},
{
"match": {
"title": {
"query": message,
"fuzziness": 1,
"minimum_should_match": "1<30%",
"boost": 5.0
}
}
},
{
"match": {
"title.edge_ngrams": {
"query": message,
"fuzziness": 1,
"minimum_should_match": "1<30%",
"boost": 3.0
}
}
},
{
"match_phrase": {
"plain_blob": {
"query": message,
"slop": 1,
"boost": 10.0
}
}
},
{
"match": {
"plain_blob": {
"query": message,
"fuzziness": 1,
"minimum_should_match": "1<30%",
"boost": 1.5
}
}
},
{
"match": {
"plain_blob.edge_ngrams": {
"query": message,
"fuzziness": 1,
"minimum_should_match": "1<30%",
"boost": 1.0
}
}
}
],
"minimum_should_match": 1
}
},
"size": 10,
"from": 0
}

elasticsearch fuzzy search space sensitive

I have elasticsearch query like ;
{
"query": {
"bool": {
"must": [{
"match": {
"text": {
"query": "yayla kent sitesi",
"fuzziness": "2"
}
}
},
{
"match": {
"type": {
"query": "2"
}
}
}
]
}
}
}
and there is records
"text":"yaylakent sitesi"
but I can't get results using fuzzy search its return many unrelated documents. Can someone help me to have a query which have one or few space sensitive search in field.
"query": "yayla kent sitesi"
should not combine to,
"query": "yaylakent sitesi"

confusion about elasticsearch documentation about containing json for bool query

In the elasticsearch doc for a bool query at this link:
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-bool-query.html
It doesn't say the containing structure. If I just use bool the way they have it, it's totally wrong. I need to surround this with some silly combination of query/filter/ filtered query. I'm not sure what is the correct way to form a json query in elastic. The documents seem to be completely contradictory in many places about what goes where and how. Any elasticsearch experts out there that know about how to properly form a query?
First of all, there is a "bool" query, and a "bool" filter, and they go in different places and do slightly different things. As a general rule, if you can use a filter do it (many of them can be cached, and are a little faster even if not). If you need a "match" then you need a query.
The example on the page you referenced could actually be used either way:
As a query:
POST /test_index/_search
{
"query": {
"bool": {
"must": {
"term": {
"user": "kimchy"
}
},
"must_not": {
"range": {
"age": {
"from": 10,
"to": 20
}
}
},
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
}
],
"minimum_should_match": 1,
"boost": 1
}
}
}
Or as a filter (in a filtered query):
POST /test_index/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": {
"term": {
"user": "kimchy"
}
},
"must_not": {
"range": {
"age": {
"from": 10,
"to": 20
}
}
},
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
}
],
"minimum_should_match": 1
}
}
}
}
}
Also I totally get the frustration with the ES documents. I've been working with them for a couple of years now, and they don't seem to be getting any better. Maybe the people in charge of documentation just don't care all that much. The conspiracy theory view would be that bad documentation helps the company sell professional services.

Resources