Performance of match vs term query in elasticsearch? - elasticsearch

I've been using a lot of match queries in my project. Now, I have just faced with term query in Elasticsearch. It seems the term query is more faster in case that keyword of your query is specified.
Now I have a question there..
Should I refactor my codes (it's a lot) and use term instead of match?
How much is the performance of using term better than match?
using term in my query:
main_query["query"]["bool"]["must"].append({"term":{object[..]:object[...]}})
using match query in my query:
main_query["query"]["bool"]["must"].append({"match":{object[..]:object[...]}})

Elastic discourages to use term queries for text fields for obvious reasons (analysis!!), but if you know you need to query a keyword field (not analyzed!!), definitely go for term/terms queries instead of match, because the match query does a lot more things aside from analyzing the input and will eventually end up executing a term query anyway because it notices that the queried field is a keyword field.

As far as I know when you use the match query it means your field is mapped as "text" and you use an analyzer. With that, your indexed word will generate tokens and when you run the query you go through an analyzer and the correspondence will be made for each of them.
Term will do the exact match, that is, it does not go through any analyzer, it will look for the exact term in the inverted index.
Because of this I believe that by not going through analyzers, Term is faster.
I use Term match to search for keywords like categories, tag, things that don't make sense use an analyzer.

Related

Elasticsearch: How to use "minimum_should_match" in match_phrase query?

I'm new to Elasticsearch. I'm trying to search keywords in Elasticsearch with the match_phrase.
And I don't want to match all terms, so I add the minimum_should_match in search queries, but it seems like it is impossible (ES doesn't support it).
What I need:
The slop of each term is 0 or bigger.
Matching terms must appear in their specified order.
It doesn't have to match every one of them. And I can specify any parameter such as the ** minimum_should_match**.
Anybody has experience on this?
Thanks in advance.

How to get matching doc for term A and term B separately in AWS elastic search with single Query

Currently I'm able to do query on AWS elastic search to get matching doc's for single term. for that I'm using below query.
Now I have a requirement to do query for multiple term and get there matching doc's
Is there anyway to do query with multiple terms in single query with that we can get the matching terms separately for each term. which save lot of time for us
Looks like you need terms query or multi_match if you are querying text fields.

Elasticsearch: exclude queries with only one term

I was wondering if it is possible in Elasticsearch to exclude queries where the query is a single term? I am trying to use "minimum_should_match" as 2, which works well when the query has 2 or more terms. However, if the number of terms in the query is 1, ES will still return results. It seems that ES is using the logic of "well you asked for a minimum of matching two terms, yet there is only one term to match; we'll lower the minimum to 1". Is there a way to turn this functionality off, or otherwise do what I am looking for?
For those wondering why this can't be done at the API level, I am using a query analyzer that excludes stop words. So a query like "a ipad" would end up being 1 term, while the API would see 2. The API could do stopword filtering but that seems to be a waste of resources.
Before doing a query you can first analyze the input by your custom analyzer.
You can use the Analyze API for this (be sure to set the analyzer property to be equal to your custom analyzer name).
The result would be a list of analyzed tokens. If your analyzer removes stopwords, it would return only ipad for a ipad.
So if the Analyze API returns only one token you actually don't need to query Elasticsearch, because you don't want any results if number of tokens is less than 2 (if I understood you correctly)

Elasticsearch questions: search, performance and caching

I'm new to elasticsearch, have been reading their API and some things are not clear to me
1) It is said that filters are cached. what does that mean? if i send a query with a filter on it, what gets cached? The results of that query? If i send a different query with the same filter, will the cache help me somehow?
I know the question is kinda vague, but so is ElasticSearch's documentation for this.
2) Is there a real performance difference between a query matching a term X to the "_all" field or to a specific field? As far i understand, both queries will be compared against all documents that contain X in one of their fields, and the only difference is in how many fields will be matched against X, in these documents. is that correct?
1) For your first question take a look at this link.
To quote from the post
"Filters don’t score documents – they simply include or exclude. If a document matches a filter, it is represented with a one in the BitSet; otherwise a zero. This means that Elasticsearch can store an entire segment’s filter state (“who matches this particular filter?”) in a single, compact BitSet.
The first time Elasticsearch executes a filter, it parses Lucene segment data structures to determine what matches your filter. Instead of throwing away this information, it caches it inside a BitSet.
The next time the same filter is executed, Elasticsearch can reference the compact BitSet instead of the Lucene segments. This has huge performance benefits."
2) "The idea of the _all field is that it includes the text of one or more other fields within the document indexed. It can come very handy especially for search requests, where we want to execute a search query against the content of a document, without knowing which fields to search on. This comes at the expense of CPU cycles and index size."link
So if you know what fields you are going to query use specifics fields to search on.

What's the performance difference between Elasticsearch _all field search and multi-match

I know Elasticsearch enables _all field as default. But for some reason (scoring) I don't want to use it. Instead I use multi_match, where I have almost 10 fields. Is the performance of multi_match worse than _all field? And how much?
IMHO, it depends more on the query than on the field itself. I think, you are talking about query_string which by default use _all field.
That said, using match query (or multi_match) instead of query_string query will probably be faster. And using multi match on a list of fields than on _all field will probably be faster as match queries are optimized and uses the best internal query depending on the field and the query content themselves.
I can not tell how much it's faster. But, you can run test easily for your use case and measure that.
HTH

Resources