multi_match vs should match vs must query_string in ElasticSearch - elasticsearch

I tried these type of queries in ElasticSearch and wondering which type is the most suitable (most accurate and most efficient) one. Basically, one person can have multiple set of names (array). Names split into firstname, surname and middlename. Some person can have just firstname and surname. Parameter (input) is fullname (combination of firstname, surname and middlename in one string). Fuzzy logic added. One difference I notice is the score.
This is the score of the first result returned.
first query: 17.41911
second query: 24.332222
third query: 21.200104
Is this mean that the second query is the most accurate query for this requirement?
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "David Bill Gonzalo~",
"fields": [
"nameDetails.name.nameValue.firstName",
"nameDetails.name.nameValue.surname",
"nameDetails.name.nameValue.middleName"
]
}
}
]
}
}
}
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"nameDetails.name.nameValue.firstName": "David Bill Gonzalo~"
}
},
{
"match": {
"nameDetails.name.nameValue.surname": "David Bill Gonzalo~"
}
},
{
"match": {
"nameDetails.name.nameValue.middleName": "David Bill Gonzalo~"
}
}
]
}
}
}
GET /person/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"fields": [
"nameDetails.name.nameValue.firstName",
"nameDetails.name.nameValue.surname",
"nameDetails.name.nameValue.middleName"
],
"query": "David Bill Gonzalo~"
}
}
]
}
}
}

First Query:
The multi-match query allows us to run a query on multiple fields. It is an extension of the match query.
As in the first query, you have not specified any type parameter, so by default best_fields is considered the type. This finds all the documents which match with the query, but _score is calculated only from the best field.
To know more about the types of multi-match queries, refer to this part of the documentation.
Second Query:
This is a boolean query with the combination of the bool/should clause. The score from each matching should clause is taken to calculate the final score here.
Third Query:
In the third query, query_string is running against multiple fields.
As you have not specified any type parameter, so by default best_fields is considered the type. This finds all the documents which match with the query, but _score is calculated only from the best field.
Since you are querying on multiple fields, with the same query parameter i.e "David Bill Gonzalo~", according to me you should use a multi-match query. You can use multi-match queries with different options as well like boosting one or more fields, adding type parameter in multi-match queries, etc.

Related

Elasticsearch exact search query

I'm using query string to search on documents in my index.
GET my_index/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "table test",
"default_field": "table.name",
"default_operator":"AND"
}
}]
}
}
}
the problem is that it returns all additional strings that include search keywords.. I wanna to give strings that have exact phrase.
for example the documents table test 1 and table test 12 and table test are in my index. when I search table test, I wanna it just return table test.
I used term also, but it could not consider space charter between strings!
how can I handle this?
your mapping is generated by Elasticsearch, than for every text field there will be a corresponding .keyword field and hence
{
"query": {
"term": {
"table.name.kwyword": { // Note .keyword in the field name.
"value": "table test",
"boost": 1.0
}
}
}
if you don't have a .keyword field, then you have to create a keyword field and use term query that is used for exact or keyword searches.
You can use Match Phrase Query as Amit suggested in another answer.
Also, if you want to use only Query String type of query then you can give your query in double quotes as shown below:
GET my_index/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "\"table test\"",
"default_field": "table.name",
"default_operator":"AND"
}
}]
}
}
}
Updated:
if you want to do exact match in entire field then you can go ahead with term query in elasticsearch:
{
"query": {
"term": {
"table.name.keyword": {
"value": "table test",
"boost": 1.0
}
}
}
}

Filter results from Elasticsearch if only a specific field matches

I'm using the following query for searching across multiple fields:
{
"query": {
"multi_match": {
"query": "italian sports car",
"fields": ["car_name", "car_brand", "car_description", "car_country"],
"type": "most_fields"
}
}
}
In this example, I'm looking for sports cars made in Italy (hence the car_country field). However, this will return all the cars made in Italy even if they are not sports cars. I want car_country to be just an auxiliary search field, so I don't want hits when the only matched field is car_country. Is this possible? I know I can set a lower score for that field, but I want hits with only this matching field to be completely ignored.
There can be different ways you handle this problem depending on the scoring etc. you require from you results. For instance -
Use a bool query with 2 parts
Must query - include queries that must match for the document to be in the resultset
Should query - include queries that should match(and impact scoring) but do not decide if a document should or should not be in the result set.
Add the multi-match query without the car_country field in must query and a match query for car_country field in should query.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "italian sports car",
"fields": [
"car_name",
"car_brand",
"car_description"
],
"type": "most_fields"
}
}
],
"should": [
{
"match": {
"car_country": {
"query": "italian sports car"
}
}
}
]
}
}
}

Elasticsearch: how to write bool query that will contain multiple conditions on the same token?

I have a field with tokenizer that splits by dots.
on search, the following value aaa.bbb will be splitted to two terms aaa and bbb.
My question is how to write bool query that will contain multiple conditions on the same term?
For example, i want to get all docs where its field contains a term that matches a fuzzy search for gmail but also the same term must not contain gamil.
Here are some examples of what i want to achieve:
bmail // MATCH: since its matches fuzzy search and is not gamil
gamil.bmail // MATCH: since the term bmail matches fuzzy search and is not gamil
gamil // NO MATCH: since its matches fuzzy search and but equals gamil
NOTE: the following query does NOT appear to be working since it looks as if one term matches one condition and the second term matches the other, it will be considered a hit.
{
...
"body": {
"query": {
"bool": {
"must": [
{
"fuzzy": {
"my_field": {
"value": "gmail",
"fuzziness": 1,
"max_expansions": 2100000000
}
}
},
{
"bool": {
"must_not": [
{
"query_string": {
"default_field": "my_field",
"query": "*gamil*",
"analyzer": "keyword"
}
}
]
}
}
]
}
}
},
}
I ended up using Highlight by executing fuzzy (or any other) query, and then programatically filter the results by the returned highlight object.
span queries might also be a good option if you don't need regular expression or you can make sure you don't exceed the boolean query limit.
(see more details in the provided link)

Filter Then Sort Results By Query in ElasticSearch

Is there a way in ElasticSearch to run a boolean filter, then without refinding the search further, sort/order the results based on a multi_field query?
Eg: Get all items with status_id = 1 (the filter), then order those documents by using the keywords "red car" (documents whose name and description contain those keywords are first, documents without are last).
You can use bool query -
As per condition of should -
The clause (query) should appear in the matching document. In a boolean query with no must clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_match parameter.
In our case , as there is a must and its a number match , score value wont be computed. But then conditions in should would be used for computing the score alone -
{
"query": {
"bool": {
"must": [
{
"match": {
"status_id": 1
}
}
],
"should": [
{
"multi_match": {
"query": "red car",
"fields": [
"subject",
"message"
]
}
}
]
}
}
}

query_string vs group match in elasticsearch

What is the difference between such query:
"query": {
"bool": {
...
"should": [
{
"match": {
"description": {
"query": "test"
}
}
},
{
"match": {
"address": {
"query": "test",
}
}
},
{
"match": {
"country": {
"query": "test"
}
}
},
{
"match": {
"city": {
"query": "test"
}
}
}
]
}}
and that one:
"query": {
"bool": {
...
"should": [
{
"query_string": {
"query": "test",
"fields": [
"description",
"address",
"country",
"city"
]
}
}
]
}}
Performance, relevance?
Thanks in advance!
The query is analyzed depending on the field analyzer (unless you specify the analyzer in the query itself), thus querying multiple fields with a single query doesn't necessarily mean analyzing the query only once.
Keep in mind that the query_string supports the lucene query syntax: AND and OR operators, querying on specific fields, wildcard, phrase queries etc. therefore it needs to be parsed, which I don't think makes a lot of difference here in terms of performance, but it is error prone and might lead to errors. If you don't need all that power, stick to the match query, and if you want to perform the same query on multiple fields, have a look at the multi_match query, which does what you did with your query_string but translates internally to multiple match queries.
Also, the scores returned if you compare the output of multiple match queries and your query_string might be quite different. Using a bool query you effectively build a lucene boolean query, while the query_string uses by default "use_dis_max":"true", which means it uses internally a dis_max query by default. Same happens using the multi_match query. If you set use_dis_max to false a bool query is going to be used internally instead.
I terms of performance, I would say that the second query will have performance benefits because, the first query requires the query string to be analyzed for all the four match sections, while in the second there is only one query string that needs to be analyzed.
Apart from that, there are some comparisons done over here that you can look at.
I am not quite sure about the relevancy differences, but that you can always fire these two queries and see if there is any difference in relevance from the results fetched.

Resources