give more score to specific document in elasticsearch query - elasticsearch

consider I have this full text query :
GET /test/_search
{
"query": {
"match": {
"title": "blue"
}
}
}
and I got this result :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 0.32156613,
"hits": [
{
"_index": "test",
"_id": "rQ5WIYYBUZECFnMRbIo6",
"_score": 0.32156613,
"_source": {
"title": "blue chair blue desk"
}
},
{
"_index": "test",
"_id": "qg5VIYYBUZECFnMRrorJ",
"_score": 0.29879427,
"_source": {
"title": "blue bird"
}
},
{
"_index": "test",
"_id": "qw5VIYYBUZECFnMR8IpD",
"_score": 0.29879427,
"_source": {
"title": "blue sky"
}
},
{
"_index": "test",
"_id": "rg5WIYYBUZECFnMRsoqo",
"_score": 0.29879427,
"_source": {
"title": "blue automobile"
}
}
]
}
}
you see scoring is completely working fine, but I want to specifically give more score to the document with title:"blue sky" whenever the search is blue so it would be my top result.
Is there any way to specifically increase the score of some documents while querying in Elasticsearch ?
I think it could be done by combining match and positive boost query but and I couldn't do that

An option is using boosting query. You can decrease the scoring document that you do not want.
I applied a negative boost to documents without the term sky in the example.
{
"query": {
"boosting": {
"positive": {
"match": {
"title": "blue"
}
},
"negative": {
"bool": {
"must_not": [
{
"match": {
"title": "sky"
}
}
]
}
},
"negative_boost": 0.1
}
}
}

Related

What does total value shows inside the _search query result in elasticsearch?

When we call the elasticsearch, say as follows:
POST https:////_search with body:
{
"from": 0,
"size": 1,
"query": {
"bool": {
"must": [
{
"range": {
"createdAt": {
"gt": "2019-11-11T10:00:00"
}
}
}
]
}
},
"sort": [
{
"createdAt" : {
"order" : "desc"
}
}
]
}
I see that I get only 1 result as pagination is set to 1 but total inside hits in response shows 2. This is the response I get:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": “<index-name>”,
"_type": "_doc",
"_id": "5113c843-dff3-499f-a12e-44c7ac103bcf_0",
"_score": null,
"_source": {
"oId": "5113c843-dff3-499f-a12e-44c7ac103bcf",
"oItemId": 0,
"createdAt": "2019-11-13T11:00:00"
},
"sort": [
1573642800000
]
}
]
}
}
Doesn’t total doesn’t capture the pagination part? And it only cares about the query report? It should show the total count of items matching the query irrespective of the pagination set, right?
Yes, You are right that total doesn't capture the pagination part and just cares about the query report ie. whatever the total no of the document matches for a given query.
To be precise, it is as explained in official ES docs .
total (Object) Metadata about the number of returned documents.
Returned parameters include:
value: Total number of returned documents. relation: Indicates whether
the number of documents returned. Returned values are:
eq: Accurate gte: Lower bound, including returned documents
It means its the total no of returned documents, but as pagination is set to 1 in your example, inner hits have just 1 document.You can cross-check this understanding easily by creating a sample example as below:
Create a sample index with just 1 text field:
URL:- http://localhost:9200/{your-index-name}/ --> PUT method
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
},
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "1"
}
}
}
Once the above index is created index below 4 documents:
URL:- http://localhost:9200/{your-index-name}/_doc/{1,2,like..} --> POST method
{
"name": "foo 1"
}
{
"name": "foo bar"
}
{
"name": "foo"
}
{
"name": "foo 2"
}
Now when you hit below search query without pagination:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "foo"
}
}
]
}
}
}
It gives below response:
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4, --> Note 4 here
"relation": "eq"
},
"max_score": 0.12199639,
"hits": [
{
"_index": "59638303",
"_type": "_doc",
"_id": "1",
"_score": 0.12199639,
"_source": {
"name": "foo"
}
},
{
"_index": "59638303",
"_type": "_doc",
"_id": "3",
"_score": 0.12199639,
"_source": {
"name": "foo"
}
},
{
"_index": "59638303",
"_type": "_doc",
"_id": "2",
"_score": 0.09271725,
"_source": {
"name": "foo bar"
}
},
{
"_index": "59638303",
"_type": "_doc",
"_id": "4",
"_score": 0.09271725,
"_source": {
"name": "foo 1"
}
}
]
}
}
But when you hit a search query with pagination:
{
"from": 0,
"size": 1,--> note size 1
"query": {
"bool": {
"must": [
{
"match": {
"name": "foo"
}
}
]
}
}
}
it gives below response
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4, --> this is still 4
"relation": "eq"
},
"max_score": 0.12199639,
"hits": [
{
"_index": "59638303",
"_type": "_doc",
"_id": "1",
"_score": 0.12199639,
"_source": {
"name": "foo"
}
}
]
}
}
Now in the above query, you can change the size and check only inner-hits array gets change but the outer hits object which contains total always remains same as 4, this confirms your understanding is correct.

boost query issue in ES

We have below 4 documents under one type and one index. And I queried as below. But the result I got I think it is not correct. Who can help me out? How is the rank works? How does boost impact the score result?
{
"title":"title",
"content":"content"
},
{
"title":"title test",
"content":"content"
},
{
"title":"title",
"content":"content test"
},
{
"title":"title test",
"content":"content test"
}
While I queried below:
{
"query": {
"bool": {
"should": [
{ "match": {
"title": {
"query": "test",
"boost": 3
}
}},
{ "match": {
"content": {
"query": "test",
"boost": 1
}
}}
]
}
}
}
I got the below result.
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 2.4428198,
"hits": [
{
"_index": "bt",
"_type": "blog",
"_id": "m4w7_GIB24WOZoUoiMIP",
"_score": 2.4428198,
"_source": {
"title": "title test",
"content": "content"
}
},
{
"_index": "bt",
"_type": "blog",
"_id": "nYw7_GIB24WOZoUoiMIP",
"_score": 1.1507283,
"_source": {
"title": "title test",
"content": "content test"
}
},
{
"_index": "bt",
"_type": "blog",
"_id": "nIw7_GIB24WOZoUoiMIP",
"_score": 0.8142733,
"_source": {
"title": "title",
"content": "content test"
}
}
]
}
}
My Question is why I can't get the first one as 2 tests. The result is the first one as test in title?

Elasticsearch aggregation with custom query parser

I cannot seem to aggregate my query results when using my custom query parser. I get a result set by these are not aggregated. When using a standard query parser like match everything turns out well.
What works:
GET pages/_search
{
"query": {
"match": {
"text": "binomial"
}
},
"aggs": {
"docs": {
"terms": {
"field": "rooturl"
}
}
}
}
returns a nice aggregated result:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 11.11176,
"hits": [
...
{
"_index": "pages",
"_type": "doc",
"_id": "AVcq6z6lzDazctHi91RE",
"_score": 3.3503218,
"_source": {
"rooturl": "document",
"type": "equation",
"url": "document:poly",
"text": "coefficient"
}
},
{
"_index": "pages",
"_type": "doc",
"_id": "AVcq6z6xzDazctHi91RF",
"_score": 3.3503218,
"_source": {
"rooturl": document",
"type": "equation",
"url": "document:poly",
"text": "dot"
}
}
...
]
},
"aggregations": {
"docs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "document",
"doc_count": 10
}
]
}
}
}
But when using my custom query parser, The result is not aggregated.
Query:
GET pages/_search
{
"query": {
"my_custom_query_parser": {
"query": "binomial"
}
},
"aggs": {
"docs": {
"terms": {
"field": "rooturl"
}
}
}
}
Can anyone point me into the right direction?

Elasticsearch: do exact searches where the query contains special characters like '#'

Get the results of only those documents which contain '#test' and ignore the documents that contain just 'test' in elasticsearch
People may gripe at you about this question, so I'll note that it was in response to my comment on this post.
You're probably going to want to read up on analysis in Elasticsearch, as well as match queries versus term queries.
Anyway, the convention here is to use a .raw sub-field on a string field. That way, if you want to do searches involving analysis, you can use the base field, but if you want to search for exact (un-analyzed) values, you can use the sub-field.
So here is a simple mapping that accomplishes this:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"post_text": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Now if I add these two documents:
PUT /test_index/doc/1
{
"post_text": "#test"
}
PUT /test_index/doc/2
{
"post_text": "test"
}
A "match" query against the base field will return both:
POST /test_index/_search
{
"query": {
"match": {
"post_text": "#test"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5945348,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.5945348,
"_source": {
"post_text": "#test"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.5945348,
"_source": {
"post_text": "test"
}
}
]
}
}
But the "term" query below will only return the one:
POST /test_index/_search
{
"query": {
"term": {
"post_text.raw": "#test"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"post_text": "#test"
}
}
]
}
}
Here is the code I used to test it:
http://sense.qbox.io/gist/2f0fbb38e2b7608019b5b21ebe05557982212ac7

Elasticsearch: facet or aggregation returning doc counts over multiple fields

I have an elasticsearch document structure for which I'd like to have a terms facet (or aggragation) for which I obtain the number of documents independently of the field in which they appear.
For example, le following result shows both the documents and facetted search result:
{
"_shards": {
"failed": 0, "successful": 5, "total": 5
},
"hits": {
"hits": [
{
"_id": "003", "_index": "test", "_score": 1.0, "_type": "test",
"_source": {
"root": {
"content": [
"five",
"five",
"five"
],
"title": "four"
}
}
},
{
"_id": "002", "_index": "test", "_score": 1.0, "_type": "test",
"_source": {
"root": {
"content": "two three",
"title": "three"
}
}
},
{
"_id": "001", "_index": "test", "_score": 1.0, "_type": "test",
"_source": {
"root": {
"content": "one two",
"title": "one"
}
}
}
],
"max_score": 1.0, "total": 3
},
"facets": {
"terms": {
"_type": "terms", "missing": 0, "other": 0,
"terms": [
{
"count": 2,
"term": "two"
},
{
"count": 2,
"term": "three"
},
{
"count": 2,
"term": "one"
},
{
"count": 1,
"term": "four"
},
{
"count": 1,
"term": "five"
}
],
"total": 8
}
},
"timed_out": false,
"took": 18,
}
We can see that the terms "one" and "three" have counts of 2 (once for each field of the same doc) where I would like them to have a count of 1. The only term with a count of 2 should be "two".
I looked into aggregation to see if it could help but it doesn't seem to work with multiple fields (or I have missed something).
It would have been nice to build a "terms" facet on "root" rather than the individual fields... but that doesn't seem possible either.
Any ideas, how to work this out ?
You can use the script in terms aggregation to achieve this.
Inside the script , collect the tokens from both the field , do a set union operation and then return the set.
{
"aggs" : {
"genders" : {
"terms" : {
"script" : "union(doc['content'].values, doc['title'].values) "
}
}
}
}
You need to see how to apply the union operation in whichever language you use to use as script language.
you could add new field, which keeps unique terms from both content and title fields, and make facet aggregation on it.

Resources