Elasticsearch search by keywords and boost - elasticsearch

I'm using Spring Boot 2.0.5, Spring Data Elasticsearch 3.1.0 and Elasticsearch 6.4.2
I have loaded ElasticSearch with a set of articles. For each article, I have a keywords field with a string list of keywords e.g.
"keywords": ["Football", "Barcelona", "Cristiano Ronaldo", "Real Madrid", "Zinedine Zidane"],
For each user using the application, they can specify their keyword preferences with a weight factor.
e.g.
User 1:
keyword: Football, weight:3.0
keyword: Tech, weight:1.0
keyword: Health, weight:2.0
What I would like to do is find articles based on their keyword preferences and display them based on their weight factor preference (I think this relates to elastic search boost) and sort by latest article time.
This is what I have so far (only for one keyword):
public Page<Article> getArticles(String keyword, float boost, Pageable pageable) {
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.matchQuery("keywords", keyword).boost(boost))
.build();
return articleRepository.search(searchQuery);
}
As a user may have n number of keyword preferences, what would I need to change in the above code to support this?
Any suggestions would be highly appreciated.
Solution
OK I enabled logging so I can could see the elastic search query being produced. Then I updated the getArticles method to the following:
public Page<Article> getArticles(List<Keyword> keywords, Pageable pageable) {
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
List<FilterFunctionBuilder> functions = new ArrayList<FilterFunctionBuilder>();
for (Keyword keyword : keywords) {
queryBuilder.should(QueryBuilders.termsQuery("keywords", keyword.getKeyword()));
functions.add(new FunctionScoreQueryBuilder.FilterFunctionBuilder(
QueryBuilders.termQuery("keywords", keyword.getKeyword()),
ScoreFunctionBuilders.weightFactorFunction(keyword.getWeight())));
}
FunctionScoreQueryBuilder functionScoreQueryBuilder = QueryBuilders.functionScoreQuery(queryBuilder,
functions.toArray(new FunctionScoreQueryBuilder.FilterFunctionBuilder[functions.size()]));
NativeSearchQueryBuilder searchQuery = new NativeSearchQueryBuilder();
searchQuery.withQuery(functionScoreQueryBuilder);
searchQuery.withPageable(pageable);
// searchQuery.withSort(SortBuilders.fieldSort("createdDate").order(SortOrder.DESC));
return articleRepository.search(searchQuery.build());
}
This produces the following elastic search query:
{
"from" : 0,
"size" : 20,
"query" : {
"function_score" : {
"query" : {
"bool" : {
"should" : [
{
"terms" : {
"keywords" : [
"Football"
],
"boost" : 1.0
}
},
{
"terms" : {
"keywords" : [
"Tech"
],
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"functions" : [
{
"filter" : {
"term" : {
"keywords" : {
"value" : "Football",
"boost" : 1.0
}
}
},
"weight" : 3.0
},
{
"filter" : {
"term" : {
"keywords" : {
"value" : "Tech",
"boost" : 1.0
}
}
},
"weight" : 1.0
}
],
"score_mode" : "multiply",
"max_boost" : 3.4028235E38,
"boost" : 1.0
}
},
"version" : true
}

What you are looking for is the function_score query. Something along the lines of
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{"term":{"keyword":"Football"}},
{"term":{"keyword":"Tech"}},
{"term":{"keyword":"Health"}}
]
}
},
"functions": [
{"filter":{"term":{"keyword":"Football"}},"weight": 3},
{"filter":{"term":{"keyword":"Tech"}},"weight": 1},
{"filter":{"term":{"keyword":"Health"}},"weight": 2}
]
}
}
}
See here for API help https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-compound-queries.html#java-query-dsl-function-score-query

Related

Elasticsearch bool query join order

Raising this question to know the order in which ES executes query clauses (must, should, filter, must_not) that are part of bool query. Sharing the sample query from ES docs -
{ "query": {
"bool" : {
"must" : {
"term" : { "user.id" : "kimchy" }
},
"filter": {
"term" : { "tags" : "production" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tags" : "env1" } },
{ "term" : { "tags" : "deployed" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
} } }
From the documentation it looks like query-clauses are joined using AND condition. For example, above search DSL's SQL counterpart would look like (rough translation) -
select * from user where user_id like 'kimchy' and tags in ('production') and not (10 <= range <= 20) and tags in ('env1', 'deployed');
I actually wasn't able to find official documentation around this, but did see some texts that ES query-evaluation heavily depends on certain cost approximations. Wondering how to map the ordering to SQL like syntax so, we can develop a clear mental picture when authoring ES queries. It also feels like ordering might have some affect for deeply nested boolean AND OR queries.

Elasticsearch - use a field match to boost only and not to fetch the document

I have a query phrase that needs to match in either of the fields - name, summary or description or the exact match on the name field.
Now, I have one more new field brand. Match in this field should be used only to boost results. Meaning if there is a match only in the brand field, the doc should not be in the result set.
To solve the without brand I have the below query:
query: {
bool: {
minimum_should_match: 1,
should: [
multi_match:{
query : "Cadbury chocklate milk",
fields : [name, summary, description]
},
term: {
name_keyword: {
value: "Cadbury chocklate milk"
}
}
]
}
}
This works fine for me.
How do I fetch the data using the same query but boost docs that have brand:cadbury, without increasing the recall set(match based on brand:cadbury).
Thanks!
Using a bool inside must should work for you.
multi_match has multiple types and for phrase you have to use type:phrase.
{
"query": {
"bool": {
"must": [
{ "bool" :
{ "should" : [ {
"multi_match" :{
"type" : "phrase",
"query" : "Cadbury chocklate milk",
"fields" : ["name", "summary", "description"]
} }, {
"term": {
"name_keyword": {
"value": "Cadbury chocklate milk"
} }
}
]
}
}
],
"should" : {
"term" : {
"brand" : {
"value" : "cadbury"
}
}
}
}
}

Custom highlights in elastic search

I am a new bie to elastic search. I have a task where I have to highlight certain queries with specific tags.
I am using a similar query mentioned in elastic search intervals. The problem now is I have to highlight "my favourite food" with a html tag,say "favorite" and cold porridge / hot water with a different html tag, say "state".
How I can do that.
POST _search
{
"query": {
"intervals" : {
"my_text" : {
"all_of" : {
"ordered" : true,
"intervals" : [
{
"match" : {
"query" : "my favourite food",
"max_gaps" : 0,
"ordered" : true
}
},
{
"any_of" : {
"intervals" : [
{ "match" : { "query" : "hot water" } },
{ "match" : { "query" : "cold porridge" } }
]
}
}
]
},
"boost" : 2.0,
"_name" : "favourite_food"
}
}
}
}
You can use the Highlighting feature in Elasticsearch as follows:
GET /index_name/_search
{
"query": {},
"highlight": {
"fields": {
"content": {
"type": "unified",
"number_of_fragments": 0,
"pre_tags": [
"<first_filter>",
"<second_filter>",
"<third filter>"
],
"post_tags": [
"<first_filter>",
"<second_filter>",
"<third filter>"
]
}
}
}
}
The order in which the tags are applied depends on the order in which the filters applied. Also note that, applying number_of_fragments:0 returns the entire content with the tagged hits.

Applying increasingly slow query filters depending on the number of matches

Is there a way of building a ES query so that it doesn't apply slower parts like wildcard searches or including more fields... If the number of results with the previous conditions already reaches the specified query size?
I assume putting aside totalHits value.
I have tried playing with the boosting setting but ES expectedly applies all the combinations.
{
"size" : 5,
"query": {
"bool": {
"should" : [
{ "term" : { "search.autocomplete" : { "value" : "120", "boost" : 20 } }},
{ "term" : { "search.autocomplete_inverse" : { "value" : "120", "boost" : 15 } }},
{ "match" : { "search.keyword" : { "query" : "120", "boost" : 10 } }},
{ "wildcard" : { "brand.search" : { "value" : "*120*", "boost": 5}}},
{ "wildcard" : { "category.search" : { "value" : "*120*", "boost": 0}}}
]
}
}
}
A way so that if the first condition matches with 5 or more docs ES doesn't spend more time trying to find more matches.
A different approach would be to execute multiple queries in my application until I reach the desired amount of results, but it doesn't feel right...

How do I have to write a Search Query in ElasticSearch?

I use the Grails ElasticSearch Plugin and want to use the following query:
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"must_not" : {
"range" : {
"age" : { "from" : 10, "to" : 20 }
}
},
"should" : [
{
"term" : { "tag" : "wow" }
},
{
"term" : { "tag" : "elasticsearch" }
}
],
"minimum_should_match" : 1,
"boost" : 1.0
}
Using the groovy api from the Grails plugin I would write something like:
def res = userAgentIdentService.search() {
"bool" {
"must" {
term("user" : "kimchy" )
}
"must_not" {
"range" {
age("from" : 10, "to" : 20 }
}
}
"should" : [
{
term( "tag" : "wow" )
}
{
term("tag" : "elasticsearch" )
}
]
"minimum_should_match" = 1
"boost" = 1.0
}
}
My query is not working!
Where do I have to define minimum_should_match and how do I have to define it?
How do I have to write the "should" : [ ... ] square brackets notation in the grails / groovy manner?
I think you're missing a couple of json levels in your search request. I don't think you can use the query without specifying that's a query (it could be a filter as well, or even something else). Have a look at this example from the groovy api reference:
def search = node.client.search {
indices "test"
types "type1"
source {
query {
terms(test: ["value1", "value2"])
}
}
}

Resources