query_string vs group match in elasticsearch - elasticsearch

What is the difference between such query:
"query": {
"bool": {
...
"should": [
{
"match": {
"description": {
"query": "test"
}
}
},
{
"match": {
"address": {
"query": "test",
}
}
},
{
"match": {
"country": {
"query": "test"
}
}
},
{
"match": {
"city": {
"query": "test"
}
}
}
]
}}
and that one:
"query": {
"bool": {
...
"should": [
{
"query_string": {
"query": "test",
"fields": [
"description",
"address",
"country",
"city"
]
}
}
]
}}
Performance, relevance?
Thanks in advance!

The query is analyzed depending on the field analyzer (unless you specify the analyzer in the query itself), thus querying multiple fields with a single query doesn't necessarily mean analyzing the query only once.
Keep in mind that the query_string supports the lucene query syntax: AND and OR operators, querying on specific fields, wildcard, phrase queries etc. therefore it needs to be parsed, which I don't think makes a lot of difference here in terms of performance, but it is error prone and might lead to errors. If you don't need all that power, stick to the match query, and if you want to perform the same query on multiple fields, have a look at the multi_match query, which does what you did with your query_string but translates internally to multiple match queries.
Also, the scores returned if you compare the output of multiple match queries and your query_string might be quite different. Using a bool query you effectively build a lucene boolean query, while the query_string uses by default "use_dis_max":"true", which means it uses internally a dis_max query by default. Same happens using the multi_match query. If you set use_dis_max to false a bool query is going to be used internally instead.

I terms of performance, I would say that the second query will have performance benefits because, the first query requires the query string to be analyzed for all the four match sections, while in the second there is only one query string that needs to be analyzed.
Apart from that, there are some comparisons done over here that you can look at.
I am not quite sure about the relevancy differences, but that you can always fire these two queries and see if there is any difference in relevance from the results fetched.

Related

difference between simple query string and multi match query

Hi I am using two search query which is giving similar result. what is difference between these two query simple query string and multi match?
1- simple_query_string
{
"size": 50,
"query": {
"bool": {
"should": [
{
"simple_query_string": {
"query": "text search",
"fields": [
"Field1^2",
"Field2^4",
"Field3^6",
"Field4^8",
"Field5^10",
"Field6^12",
"Field7^14",
"Field8^16",
"*^.1"
]
}
}
]
}
},
"sort": [
"_score",
{
"Field6.keyword": {
"order": "desc"
}
}
]
}
2- Multimatch query
GET index/_search
{
"size": 50,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "text search",
"fields": [
"Field1^2",
"Field2^4",
"Field3^6",
"Field4^8",
"Field5^10",
"Field6^12",
"Field7^14",
"Field8^16",
"*^.1"
],
"type": "most_fields"
}
}
]
Both query gives same result in same order. Is there any advantage of any query ?
Both queries are the same as they will be converted to same query string. If you use query string your query will be slightly faster as you Elastic doesn't need to rewrite your query.
All queries in Lucene undergo a "rewriting" process. A query (and its sub-queries) may be rewritten one or more times, and the process continues until the query stops changing. This process allows Lucene to perform optimizations, such as removing redundant clauses, replacing one query for a more efficient execution path, etc. For example a Boolean → Boolean → TermQuery can be rewritten to a TermQuery, because all the Booleans are unnecessary in this case. The rewriting process is complex and difficult to display, since queries can change drastically. Rather than showing the intermediate results, the total rewrite time is simply displayed as a value (in nanoseconds). This value is cumulative and contains the total time for all queries being rewritten.
You can check your query performance and rewrite time by setting "profile": "true" in your query, for more information check official documentation of Elastic search here.

Elasticsearch: how to write bool query that will contain multiple conditions on the same token?

I have a field with tokenizer that splits by dots.
on search, the following value aaa.bbb will be splitted to two terms aaa and bbb.
My question is how to write bool query that will contain multiple conditions on the same term?
For example, i want to get all docs where its field contains a term that matches a fuzzy search for gmail but also the same term must not contain gamil.
Here are some examples of what i want to achieve:
bmail // MATCH: since its matches fuzzy search and is not gamil
gamil.bmail // MATCH: since the term bmail matches fuzzy search and is not gamil
gamil // NO MATCH: since its matches fuzzy search and but equals gamil
NOTE: the following query does NOT appear to be working since it looks as if one term matches one condition and the second term matches the other, it will be considered a hit.
{
...
"body": {
"query": {
"bool": {
"must": [
{
"fuzzy": {
"my_field": {
"value": "gmail",
"fuzziness": 1,
"max_expansions": 2100000000
}
}
},
{
"bool": {
"must_not": [
{
"query_string": {
"default_field": "my_field",
"query": "*gamil*",
"analyzer": "keyword"
}
}
]
}
}
]
}
}
},
}
I ended up using Highlight by executing fuzzy (or any other) query, and then programatically filter the results by the returned highlight object.
span queries might also be a good option if you don't need regular expression or you can make sure you don't exceed the boolean query limit.
(see more details in the provided link)

How to boost individual documents

I have a pretty complex query and now I want to boost some documents that fulfill some criteria. I have the following simplified document structure and I try to give some documents a boost based on the id, genre, tag.
{
"id": 123,
"genres": ["ACTION", "DRAMA"],
"tags": ["For kids", "Romantic", "Nature"]
}
What I want to do is for example
id: 123 boost: 5
genres: ACTION boost: 3
tags: Romantic boost: 0.2
and boost all documents that are contained in my query and fit the criteria but I don't want to filter them out. So query clause boosting is not of any help I guess.
Edit: To make if easier to understand what I want to achieve (not sure if it is possible with elasticsearch, no is also a valid answer).
I want to search with a query and get a result set. In this set I want to boost some documents. But I don't want to enlarge the result set or filter it. The boost should be independent from the query.
For example I search for a specific tag and want to boost all documents with category 'ACTION' in the result set. But I don't want all documents with category 'ACTION' in the result set and also I don't want only documents with the specific tag AND category 'ACTION'.
I think you need to have Dynamic boosting during query time.
The first matches the id title with boost and second one matches the 'genders' ACTION.
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "id",
"boost": 5
}
}
},
{
"match": {
"content": "Action"
}
}
]
}
}
}
If you want to have multi_match match based on your query:
{
"multi_match" : {
"query": "some query terms here",
"fields": [ "id^5", "genders^3", "tags^0.2" ]
}
}
Note: the ^5 means boost for the title.
Edit:
Maybe you are asking for different types of multi_match queries (at least for ES 5.x) from the ES reference guide:
best_fields
(default) Finds documents which match any field, but uses
the _score from the best field. See best_fields.
most_fields
Finds documents which match any field and combines the _score from
each field. See most_fields.
cross_fields
Treats fields with the same analyzer as though they were one big
field. Looks for each word in any field. See cross_fields.
phrase
Runs a match_phrase query on each field and combines the _score from
each field. See phrase and phrase_prefix.
phrase_prefix
Runs a match_phrase_prefix query on each field and combines the _score
from each field. See phrase and phrase_prefix.
More at: ES 5.4 ElasticSearch reference
I found a solution and it was pretty simple. I use a boosting query. I now just nest the different boosting criteria with and my original query is now the base query.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-boosting-query.html
For example:
{
"query": {
"boosting": {
"positive": {
"boosting": {
"positive": {
"match": {
"director": "Spielberg"
}
},
"negative": {
"term": {
"genres": "DRAMA"
}
},
"negative_boost": 1.3
}
},
"negative": {
"term": {
"tags": "Romantic"
}
},
"negative_boost": 1.2
}
}
}

Is constant_score faster in ElasticSearch queries if I don't care about scoring?

I make several queries to ElasticSearch to retrieve documents by keywords (I match them by code or internal id's). I don't really care about scoring in those queries, just retrieving the documents.
Would wrapping the bool queries I use in a constant_score filter increase performance, or make sense whatsoever?
It make no sense. If you are using bool query then you can apply filter to them.
GET /_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
filter - The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching.
Even more constant_score should be used for scoring so if there is match apply "boost" value as a score.
To Sum Up: Use filter for filter and constant_score when you need score

Elasticsearch case-insensitive query_string query with wildcards

In my ES mapping I have an 'uri' field which is currently set to not_analysed and I'm not allowed to change the mapping.I wanted to search for uri parts with a query_string query like this (this ES query is autogenerated, that is why it is a bit complicated but let's just focus on the query_string part)
{
"sort": [{"updated": {"order": "desc"}}],
"query": {
"bool": {
"must":[{
"query_string": {
"query":"*w3\\.org\\/2014\\/01\\/a*",
"lowercase_expanded_terms": true,
"default_field": "uri"
}
}],
"minimum_number_should_match": 1
}
}, "size": 50}
Now it is usually working, but I've the following url stored (fictional url): http://w3.org/2014/01/Abc.html and this query does not bring it back because of the A-a difference. Setting the expanded terms to false also not solves this. What should I do for this query to be case insensitive?
Thanks for the help in advance.
From the docs, it seems like you need a new analyzer that first transforms to lowercase and then can run the search. Have you tried that?
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/sorting-collations.html
As I read it, your pattern, lowercase_expanded_terms, only applies to expansions, not to regular words
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
lowercase_expanded_terms
Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Default it true
Try to use match query instead of query string.
{
"sort": [
{
"updated": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"uri": "*w3\\.org\\/2014\\/01\\/a*"
}
}
]
}
},
"size": 50
}
Query string queries are not analyzed and but match queries are analyzed.

Resources