How to boost specific terms in elastic search? - elasticsearch

If I have the following mapping:
PUT /book
{
"settings": {},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"author": {
"type": "text"
}
}
}
}
How can i boost specific authors higher than others?
In case of the below example:
PUT /book/_doc/1
{
"title": "car parts",
"author": "john smith"
}
PUT /book/_doc/2
{
"title": "car",
"author": "bob bobby"
}
PUT /book/_doc/3
{
"title": "soap",
"author": "sam sammy"
}
PUT /book/_doc/4
{
"title": "car designs",
"author": "joe walker"
}
GET /book/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "car" }},
{ "match": { "title": "parts" }}
]
}
}
}
How do I make it so my search will give me books by "joe walker" are at the top of the search results?

One solution is to make use of function_score.
The function_score allows you to modify the score of documents that are retrieved by a query.
From here
Base on your mappings try to run this query for example:
GET book/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"title": "car"
}
},
{
"match": {
"title": "parts"
}
}
]
}
},
"functions": [
{
"filter": {
"match": {
"author": "joe walker"
}
},
"weight": 30
}
],
"max_boost": 30,
"score_mode": "max",
"boost_mode": "multiply"
}
}
}
The query inside function_score is the same should query that you used.
Now we want to take all the results from the query and give more weight (increase the score) to joe walker's books, meaning prioritize its books over the others.
To achieved that we created a function (inside functions) that compute a new score for each document returned by the query filtered by joe walker books.
You can play with the weight and other params.
Hope it helps

Related

Proximity-Relevance in elasticsearch

I have an json record in the elastic search with fields
"streetName": "5 Street",
"name": ["Shivam Apartments"]
I tried the below query but it does not return anything if I add streetName bool in the query
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": {
"match": {
"name": {
"query": "shivam apartments",
"minimum_should_match": "80%"
}
}
}
}
},
{
"bool": {
"must": {
"match": {
"streetName": {
"query": "5 street",
"minimum_should_match": "80%"
}
}
}
}
}
]
}
}
}
Document Mapping
{
"rabc_documents": {
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"position_increment_gap": 0
},
"streetName": {
"type": "keyword"
}
}
}
}
}
Based on the E.S Documentation (Keywords in Elastic Search)
"Keyword fields are only searchable by their exact value".
Along with that keywords are case sensitive as well.
Taking aforementioned into account:
Searching for "5 street" will not match "5 Street" ('s' vs 'S') on keyword field
minimum_should_match will not work on a keyword field.
Suggestion: For partial matches use "text" mapping instead of "keyword". Keywords are meant to be used for filtering, aggregation based on term, etc.

ElastiSearch Query: How to do inline "calculation" between fields, and then use it as boost variable?

I have an Books Index with fields something like this:
{
"title": "To Kill a Mockingbird",
"summary": "To Kill a Mockingbird takes place in Alabama during the Depression..",
"type": "book",
"views": 36
},
{
"title": "The Genius of Birds",
"summary": "The Genius Of Birds shines a new light on a genuinely underrated kind..",
"type": "book",
"views": 10
},
{
"title": "Handbook of Bird Biology",
"summary": "The Handbook of Bird Biology is an essential reference for birdwatchers..",
"type": "book",
"views": 27
}
In ElasticSearch v5.1, below is my current simple Query which is working on it's own:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"type": "book"
}
}
]
}
},
"must": {
"multi_match": {
"query": "the bird",
"fields": [
"title",
"summary"
]
}
}
}
}
}
(Searching for the words the bird from the fields: title, summary where the type must be book)
This gives me a simple result based on title and summary fields. But i need it to be modified a little bit more.
Is it possible to modify the Query to look something like:
..
"must": {
"multi_match": {
"query": "the bird",
"fields": [
"title^(0.1*views)",
"summary"
]
}
}
..
I don't know how to call it in ES, but basically i want to boost a field (the title) by another field (the view).
Or in the simplest form, something like:
field1^(field2)
Thanks Aarchit Saxena for the hint in the comment section. Now i know it is called field_value_factor, and then by exploring further from there, i've now finally managed to get the query i needed.
The original query (above) has became like this now:
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"type": "book"
}
}
]
}
},
"must": {
"multi_match": {
"query": "the bird",
"fields": [
"title",
"summary"
]
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "views",
"factor": 1,
"modifier": "none",
"missing": 1
}
}
],
"boost": 1,
"boost_mode": "multiply"
}
}
}
Thank you.

Sort Elasticsearch results based on field value

Assuming I have 3 documents (users), and they have knowledge of multiple programming languages - with scores associated, as described below, how can I search for multiple fields (multi-match for example), and if some search-keywords hits a language, sort by its score?
// user1
{
"name": "John Bayes",
"prog_langs": [
{
"name": "python",
"score": 10
},
{
"name": "java",
"score": 500
}
]
}
// user2
{
"name": "John Russel",
"prog_langs": [
{
"name": "python",
"score": 100
},
{
"name": "PHP",
"score": 200
}
]
}
// user3
{
"name": "Terry Guy",
"prog_langs": [
{
"name": "C++",
"score": 600
},
{
"name": "Javascript",
"score": 200
}
]
}
For example: searching "John python"
Should return user1 and user2, but user2 showing up first
**I've been trying to use sort and functions, but I think they always use lowest/highest/average values of score.
Thanks!
[Edit]
**In the meantime I got it working in a testing way to see if without full-text/multi-matched works, and I found out I had to make "prog_langs" nested, so I changed the mapping and it works as expected.
Now I'm only missing the part where a full-text search with multi-match merges with current query.
Thanks again!
I managed to fix the query and now it's working as expected.
Before posting my solution, just have to leave a few things to keep in mind:
I made a new mapping, and added some nested objects, so my original query had to suffer some changes (prog_langs are now of type nested)
I wanted at least two fields to match, being mandatory which should match at least once
{
"query": {
"bool": {
"must": [
{
"query": {
"match": {
"name": {
"query": "john python",
"boost": 5
}
}
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "prog_langs",
"query": {
"match": {
"prog_langs.name": {
"query": "john python",
"boost": 5
}
}
}
}
}
]
}
}
],
"should": [
{
"function_score": {
"query": {
"match": {
"prog_langs.name": "john python"
}
},
"functions": [
{
"script_score": {
"script": "_score * (1 + doc['prog_langs.score'].value)"
}
}
]
}
}
]
}
},
"highlight": {
"fields": {
"name": {},
"prog_langs.name": {}
}
}
}

Elasticsearch: Conditionally filter query on fields if they exist in multi-index query

I have a query for a general search which spans multiple indices. Some of the indices have a field called is_published and some have a field called date_review, some have both.
I'm struggling to write a query which will search across fields and filter on the fields mentioned above but only if they exist. I have managed to achieve what I want on the individual fields using missing and/or exists, but it excludes the other variants.
In english, I want to keep documents in the result where:
is_published is true OR the field does not exist
date_review is in the future OR the field does not exist
So, if a document has is_published and it's false, remove it. If a document has date_review in the past, remove it. If it has is_published == false and date_review is in the future, remove it.
I hope this makes sense?
For the purpose of answering, assume the documents might look like this:
// Has `is_published` flag
{
"label": "My document",
"body": "Lorem ipsum doler et sum.",
"is_published": true
}
// Has `date_review` flag
{
"label": "My document",
"body": "Lorem ipsum doler et sum.",
"date_review": "2017-01-01"
}
// Has both `is_published` and `date_review` flags
{
"label": "My document",
"body": "Lorem ipsum doler et sum.",
"is_published": true
"date_review": "2017-01-01"
}
At the moment, my [unfiltered] query looks like this:
{
"index": "index-1,index-2,index-3",
"type": "item",
"body": {
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "my serach phrase",
"type": "phrase_prefix",
"fuzziness": null,
"fields": [
"label^3",
"body",
]
}
},
"filter": []
}
}
}
}
Very grateful for any pointers.
Thanks.
You can try a query like this one:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "my serach phrase",
"type": "phrase_prefix",
"fuzziness": null,
"fields": [
"label^3",
"body"
]
}
},
"filter": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"missing": {
"field": "is_published"
}
},
{
"term": {
"is_published": true
}
}
]
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"missing": {
"field": "date_review"
}
},
{
"range": {
"date_review": {
"gt": "now"
}
}
}
]
}
}
]
}
}
}
}
}

elasticsearch search results with sub query

Getting started with elasticsearch, not sure if this is possible with one query along with pagination. I have a index with two types: user & blog. Example mapping:
"mappings": {
"user": {
"properties": {
"name" : { "type": "string" }
}
},
"blog": {
"properties": {
"title" : { "type": "string" },
"author_name" : { "type": "string" }
}
}
}
}
sample data
user:
[
{"name": "jemmy"},
{"name": "Tom"}
]
blog:
[
{"title": "foo bar", "author": "jemmy"},
{"title": "magic foo", "author": "Tom"},
{"title": "bigdata for dummies", "author": "Tom"},
{"title": "elasticsearch", "author": "Tom"},
{"title": "JS cookbook", "author": "jemmy"},
]
I'd like to query on the index such a way that when I search for blog it should do subquery on on each match. For example:
POST /test_index/blog/_search
{
"query": {
"match": {
"_all": "foo"
}
}
}
Expected (pseudo) results:
[
{
title: "foo bar",
author_name: "Jemmy",
author_post_count: 2
},
{
title: "magic foo",
author_name: "Tom",
author_post_count: 3
}
]
Here author_post_count is blog post count that the user has authored. If it could return those blog posts instead of count that would be great too. Is this possible? Perhaps the term i'm using not right, but I hope my question is clear.
Try something like this:
POST /test_index/blog/_search
{
"query": {
"match": {
"_all": "foo"
}
},
"aggs": {
"counting_posts": {
"global": {},
"aggs": {
"authors": {
"terms": {
"field": "author",
"size": 10
}
}
}
}
}
}
Be careful though with terms aggregation because it is considering the actual tokenized list of terms from the index, not what you actually index (lowercase/uppercase, tokenized in a way or another).

Resources