Elasticsearch Per-Field Boosts, Wildcard and Explicit Field Matches Conflict - elasticsearch

Some queries in Elasticsearch offer wildcard matching for fields with boosts, for example, the simple_query_string query.
In our use-case we would like to boost specific fields while giving all other fields a score of 0. We thought this could be achieved with the following example query pattern:
GET /kibana_sample_data_ecommerce/_search
{
"profile": "true",
"query": {
"simple_query_string": {
"query": "Eddie",
"fields": [
"*^0",
"customer_full_name^100"
]
}
}
}
It appears though that if several field definitions match the same field name via wildcards their boosts are multiplied. The above example will yield a score of 0 for all documents even if they contain the token "Eddie" in the customer_full_name field.
This following example demonstrates that the boosts are multiplied:
GET /kibana_sample_data_ecommerce/_search
{
"profile": "true",
"query": {
"simple_query_string": {
"query": "Eddie",
"fields": [
"*^0.1",
"customer_full_name^100"
]
}
}
}
It leads to the expression (customer_full_name:eddie)^10.0 in the profile explanation of the query.
Does that mean that it is not possible to achieve our desired outcome with field boosts? The desired outcome is: All matches in a specific field have theirs score multiplied by 100 while all documents with matches in other fields are still returned but have 0 score.

Related

Elasticsearch: how to write bool query that will contain multiple conditions on the same token?

I have a field with tokenizer that splits by dots.
on search, the following value aaa.bbb will be splitted to two terms aaa and bbb.
My question is how to write bool query that will contain multiple conditions on the same term?
For example, i want to get all docs where its field contains a term that matches a fuzzy search for gmail but also the same term must not contain gamil.
Here are some examples of what i want to achieve:
bmail // MATCH: since its matches fuzzy search and is not gamil
gamil.bmail // MATCH: since the term bmail matches fuzzy search and is not gamil
gamil // NO MATCH: since its matches fuzzy search and but equals gamil
NOTE: the following query does NOT appear to be working since it looks as if one term matches one condition and the second term matches the other, it will be considered a hit.
{
...
"body": {
"query": {
"bool": {
"must": [
{
"fuzzy": {
"my_field": {
"value": "gmail",
"fuzziness": 1,
"max_expansions": 2100000000
}
}
},
{
"bool": {
"must_not": [
{
"query_string": {
"default_field": "my_field",
"query": "*gamil*",
"analyzer": "keyword"
}
}
]
}
}
]
}
}
},
}
I ended up using Highlight by executing fuzzy (or any other) query, and then programatically filter the results by the returned highlight object.
span queries might also be a good option if you don't need regular expression or you can make sure you don't exceed the boolean query limit.
(see more details in the provided link)

ANDing search keywords for elastic Search

How can we configure elastic search so that it only returns results which matches all the words in the search query. The documents indexed have data having multiple fields and so the words of search query may match different fields of data but all the words must get matched in the result ?
you can query string query feature to search for results
sample search query
GET /_search
{
"query": {
"query_string": {
"query": "(content:this OR name:this) AND (content:that OR name:that)"
}
}
}
In this query content and name is the field name, this is the search criteria
you can build search query similar to that.
I think you're looking for a multi_match query together with and operator. This is the link to docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html and it seems that cross_fieldsis query type you're looking for. I'd read more on that page, but this is probably what you are looking for:
GET /_search
{
"query": {
"multi_match" : {
"query": "Will Smith",
"type": "cross_fields",
"fields": [ "first_name", "last_name" ],
"operator": "and"
}
}
}

How to perform search query on two different data types?

my query is very simple, for the sake of even making it simpler, lets say I only search on two fields, name(text) & age(long):
GET person_db/person/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase_prefix": {
"name": "hank"
}
},
{
"match_phrase_prefix": {
"age": "hank"
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
if I search for "23", no problem, elastic knows how to change it to numeric and it won't fail, but if the search input is "john" I get error 400 "reason": "failed to create query: {\n \"bool\....".
what should I do in this case?
I thought of changing the values that are numeric to strings before insert to es, but trying to avoid it, I think es should have a way to support it.
appreciate it
This query works: (thanks to #jmlw)
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "alt",
"type": "phrase_prefix",
"fields": [
"name",
"taxid",
"providers.providerAddress.street"
],
"lenient": true
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
Without details of your documents, or your mappings, my first guess is that the age field is interpreted as a numeric field by Elasticsearch. Passing in anything other than a 'number' type, or something that can be converted into a number will cause the query to fail, with some exception reporting a failure to convert your string into a number.
With that said, you may try add ing lenient: true to your match_phrase_prefix search term, which will allow Elasticsearch to ignore failures to convert to a numeric type, and remove that term from the search.
Another approach is to only allow users to query on multiple fields of the same type, or specify what data they'd like to query in which field. I.E. I'm a user, and I want to search for people where age is 23, and have the name John, instead of typing in 23 John, or similar.
Otherwise, you may need to pre-process the query string, and split search terms and pass them into search clauses individually with lenient: true to attempt searching multiple terms in multiple fields with different data types.
You could also try using a different search type, like a multi_match, query_string, or simple_query_string as these will likely have more flexibility for what you are wanting to do.

using cutoff_frequency in elasticsearch with multiple fields

I'm using cutoff_frequency in a multi_match query with multiple fields. Is it applied to every field individually? How does it work?
This is what my code looks like.
POST beta2_index/_search
{
"_source": ["title"],
"size": 20,
"query": {
"multi_match": {
"query": "test query",
"fields": [
"title",
"description"],
"cutoff_frequency" : 0.1
}
}
}
multi_match query with option best_fields type (the default), is transformed into a dis_max query that wraps match queries, so cutoff_frequency option should be "forwarded" to every sub match queries.
See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fields.
The best_fields type generates a match query for each field and wraps
them in a dis_max query, to find the single best matching field.

query_string vs group match in elasticsearch

What is the difference between such query:
"query": {
"bool": {
...
"should": [
{
"match": {
"description": {
"query": "test"
}
}
},
{
"match": {
"address": {
"query": "test",
}
}
},
{
"match": {
"country": {
"query": "test"
}
}
},
{
"match": {
"city": {
"query": "test"
}
}
}
]
}}
and that one:
"query": {
"bool": {
...
"should": [
{
"query_string": {
"query": "test",
"fields": [
"description",
"address",
"country",
"city"
]
}
}
]
}}
Performance, relevance?
Thanks in advance!
The query is analyzed depending on the field analyzer (unless you specify the analyzer in the query itself), thus querying multiple fields with a single query doesn't necessarily mean analyzing the query only once.
Keep in mind that the query_string supports the lucene query syntax: AND and OR operators, querying on specific fields, wildcard, phrase queries etc. therefore it needs to be parsed, which I don't think makes a lot of difference here in terms of performance, but it is error prone and might lead to errors. If you don't need all that power, stick to the match query, and if you want to perform the same query on multiple fields, have a look at the multi_match query, which does what you did with your query_string but translates internally to multiple match queries.
Also, the scores returned if you compare the output of multiple match queries and your query_string might be quite different. Using a bool query you effectively build a lucene boolean query, while the query_string uses by default "use_dis_max":"true", which means it uses internally a dis_max query by default. Same happens using the multi_match query. If you set use_dis_max to false a bool query is going to be used internally instead.
I terms of performance, I would say that the second query will have performance benefits because, the first query requires the query string to be analyzed for all the four match sections, while in the second there is only one query string that needs to be analyzed.
Apart from that, there are some comparisons done over here that you can look at.
I am not quite sure about the relevancy differences, but that you can always fire these two queries and see if there is any difference in relevance from the results fetched.

Resources