Elasticsearch Boosting Search Terms - elasticsearch

I have an elasticsearch index that I am trying to search for matches based on multiple fields (title and description). If a particular term shows up in the title I want to be able to boost the score by 2*original score. If it is in description it should remain the original score. I am a bit confused by the elasticsearch documentation. Can anyone help me adjust the following query to reflect this logic?
{
"query": {
"query_string": {
"query": "string",
"fields": ["title","description"]
}
}
}

You just need to add ^2 to get a boost on the field you want:
{
"query": {
"query_string": {
"query": "string",
"fields": ["title^2","description"]
}
}
}

Related

Elasticsearch: Give higher score to long title

I have an Elasticsearch database in which all documents have a title field and are queried by this field.
By default, Elasticsearch search gives higher score to document with short title. But, in my use case, short titles are irrelevant.
For example, when I search for 'Deep Learning' the first results are
'Deep Learning'
'Machine Learning and Deep Learning'
'Semi-Supervised Deep Learning With Memory'
I would like the document titled 'Semi-Supervised Deep Learning With Memory' to appear before the document titled 'Deep Learning'.
Is there any solution to achieve that without changing the mapping?
Thanks
The best way would be to compute the content length ant the ingesting time, keep it in a field and use it with a function score query.
{
"query": {
"function_score": {
"query": {
"match": { "title": "elasticsearch" }
},
"script_score": {
"script": {
"source": "doc['title_length'].value * _score"
}
}
}
}
}
Or you can use a script score query which use the field size if you have a keyword mapping on this field.
If not you will have to enable field data on it. But it's very expensive and not recommended.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-score-query.html
GET /_search
{
"query": {
"script_score": {
"query": {
"match": { "title": "elasticsearch" }
},
"script": {
"source": "doc['title'].value.length() * _score"
}
}
}
}

Query string query with keyword and text fields in the same search

Upgrading from Elasticsearch 5.x to 6.x. We make extensive use of query string queries and commonly construct queries which used fields of different types.
In 5.x, the following query worked correctly and without error:
{
"query": {
"query_string": {
"query": "my_keyword_field:\"Exact Phrase Here\" my_text_field:(any words) my_other_text_field:\"Another phrase here\" date_field:[2018-01-01 TO 2018-05-01]",
"default_operator": "AND",
"analyzer": "custom_text"
}
}
}
In 6.x, this query will return the following error:
{
"type": "illegal_state_exception",
"reason": "field:[my_keyword_field] was indexed without position data; cannot run PhraseQuery"
}
If I wrap the phrase in parentheses instead of quotes, the search will return 0 results:
{
"query": {
"query_string": {
"query": "my_keyword_field:(Exact Phrase Here)",
"default_operator": "AND",
"analyzer": "custom_text"
}
}
}
I guess this is because there is a conflict between the way the analyzer stems the incoming query and how the data is stored in the keyword field, but the phrase approach (my_keyword_field:"Exact Phrase Here") did work in 5.x.
Is this no longer supported in 6.x? And if not, what is the migration path and/or a good workaround?
It would be better to rephrase the query by using different type of queries available for different use cases. For example use term query for exact search on keyword field. Use range query for ranges etc.
You can rephrase query as below:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "my_text_field:(any words) my_other_text_field:\"Another phrase here\"",
"default_operator": "AND",
"analyzer": "custom_text"
}
},
{
"term": {
"my_keyword_field": "Exact Phrase Here"
}
},
{
"range": {
"date_field": {
"gte": "2018-01-01",
"lte": "2018-05-01"
}
}
}
]
}
}
}

Different relevance of fields in elasticsearch query

I have about 10 fields in my elastic index. I want to search over all these fields. So I set no fields parameter in my query:
GET /_search
{
"query": {
"query_string": {
"query": "this OR that"
}
}
}
Now I want to set the field "title" more relevant. I know that I can do this by:
"fields": ["title^5"]
My problem is that in this case I only search over the field "title", isn't it?
is there a possibility to search over all fields but set one of these fields more relevant?
What I suggest is to specify all the 10 fields you want to search on so you can boost specific ones, like this:
GET /_search
{
"query": {
"query_string": {
"query": "this OR that",
"fields": ["title^5", "field2", "field3", ...]
}
}
}

ElasticSearch Multi-match and scoring

I'm using the following query on Elastic Search 2.3.3
es_query = {
"fields": ["title", "content"],
"query":
{
"multi_match" : {
"query": "potato tomato",
"type": "best_fields",
"fields": [ "title_cuis", "content_cuis" ]
}
}
}
I would like the results to be scored so that the first document returned is the one that contains the highest occurrence of the words "tomato" and "potato", but this doesn't seem to happen and I was wondering how I can modify the query to get that without re-indexing.
You're using best_fields, this will use the max score retrieved in matching process from title_cuis or content_cuis, separately.
Take a look to cross-fields

Finding an exact phrase in multiple fields with Elasticsearch

I'm wanting to find an exact phrase (for instance, "the quick brown fox") across mutliple fields in a document.
Right now, I'm using something like this:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox"
}
},
"filters": {
"and": [
{
"term": {
"priority": "high"
}
}
...more ands
]
}
}
}
}
Question is, how can I do this correctly. Right now I'm getting the best match first, which tends to be the entire phrase, but I'm getting a load of almost matches too.
If you are using an ElasticSearch cluster with version >= 1.1.0, you could set the mode of your multi-match query to phrase :
...
"query": {
"multi_match": {
"fields": [
"subject",
"comments"
],
"query": "the quick brown fox",
"type": "phrase"
}
...
It will replace the match query generated for each field by a match_phrase one, which will return only the documents containing the full phrase (you can find details in the documentation)
how are you analyzing the subject/comments fields? if you want exact match, you'll need to use the keyword tokenizer for both index/search.

Resources