Decay filter function for a no-limit value with ElasticSearch - boost

I have the following documents (at least 1 000 000) in an ElasticSearch index:
{"title":"toto", "views":132, "likes":23, "date" : "2014-09-01..." ...}
Where title is indexed with a lang analyser, views and likes fields are integer from 0 to infinite, and the date is a ..date field.
I want to search by title, and boost documents if they are recent and have a high views and likes.
I am using a decay filter function for the date (from today as origin), it's working as expected, but I don't know how to do for boosting the views and likes fields, since I have no max-origin.
Here my search query:
POST /threads/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "air france",
"type": "phrase",
"fields": [
"title^4",
"desc"
]
}
},
"functions": [
{
"exp": {
"date": {
"origin": "2014/09/29 13:00:00",
"scale": "12h",
"offset":"6h",
"decay":0.5
}
}
}
]
}
}
}

You could try a "field_value_factor", as per this section in the documentation. And you'd need to test and assess the results, modify the "factor" and the boost you are giving to "title" and then test again and see if it's getting closer to what you need. Also, you can use search=explain to see how ES computes the _score. Something like this:
POST /threads/_search?explain
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "air france",
"type": "phrase",
"fields": [
"title^8",
"desc"
]
}
},
"functions": [
{
"exp": {
"date": {
"origin": "2014/09/29 13:00:00",
"scale": "12h",
"offset":"6h",
"decay":0.5
}
}
},
{
"field_value_factor": {
"field": "views",
"modifier": "log2p",
"factor": 0.1
}
},
{
"field_value_factor": {
"field": "likes",
"modifier": "log2p",
"factor": 0.1
}
}
]
}
}
}

Related

How to convert ElasticSearch query to ES7

We are having a tremendous amount of trouble converting an old ElasticSearch query to a newer version of ElasticSearch. The original query for ES 1.8 is:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"terms": {
"organization_id": [
"fred"
]
}
}
]
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 0,
"field": "status"
}
},
"tags": {
"terms": {
"size": 0,
"field": "tags"
}
}
}
}
and we are trying to convert it to ES version 7. Does anyone know how to do that?
The Elasicsearch docs for Filtered query in 6.8 (the latest version of the docs I can find that has the page) state that you should move the query and filter to the must and filter parameters in the bool query.
Also, the terms aggregation no longer support setting size to 0 to get Integer.MAX_VALUE. If you really want all the terms, you need to set it to the max value (2147483647) explicitly. However, the documentation for Size recommends using the Composite aggregation instead and paginate.
Below is the closest query I could make to the original that will work with Elasticsearch 7.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"terms": {
"organization_id": [
"fred"
]
}
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 2147483647,
"field": "status"
}
},
"tags": {
"terms": {
"size": 2147483647,
"field": "tags"
}
}
}
}

how to add filters to elastic query when using function_score?

Here is my current elastic query:
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"must": [{
"multi_match": {
"query": "ocean",
"fields": [],
"fuzziness": "AUTO"
}}],
"must_not": [{
"exists": {
"field": "parentId"
}
}]
}
},
"functions" : [
{
"gauss": {
"createTime": {
"origin": "2020-07-09T23:50:00",
"scale": "365d",
"decay": 0.3
}
}
}
]
}
}
}
How do I properly add filters to this? I think maybe the fact that I'm using function_score makes this different? I would like to add a hard filter, for example, only show me results with uploadUser: 'Mr. Bean' ... but still keep the scoring in place for the results that pass this filter.
I tried using filter in various places, also using must but I either get no results or all the results.
I'm using Elastic Search 7. Thanks for your help
You can try this below search query:
Refer this ES official documentation to know more about Function score query
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"term": {
"uploadUser": "Mr. Bean"
}
},
"must": [
{
"multi_match": {
"query": "ocean",
"fields": [
],
"fuzziness": "AUTO"
}
}
],
"must_not": [
{
"exists": {
"field": "parentId"
}
}
]
}
},
"functions": [
{
"gauss": {
"createTime": {
"origin": "2020-07-09T23:50:00",
"scale": "365d",
"decay": 0.3
}
}
}
]
}
}
}

function_score: treat missing field as perfect hit

What I need to do is boost documents by location (closer is better). locations is nested type.
Works fine except Elasticsearch does not return documents if locations is missing in document. If fields is missing, Elasticsearch should treat document as perfect hit. Any idea how to achieve this?
My query:
{
"sort": [
{
"_score": "desc"
}
],
"query": {
"function_score": {
"query": {
"bool": {
"must_not": [],
"should": [
{
"nested": {
"path": "locations",
"query": {
"function_score": {
"score_mode": "sum",
"functions": [
{
"gauss": {
"locations.coordinates": {
"origin": {
"lat": "50.1078852",
"lon": "15.0385376"
},
"scale": "100km",
"offset": "20km",
"decay": "0.5"
}
}
}
]
}
}
}
}
]
}
}
}
}
}
BTW: I'm using Elasticsearch 5.0
Add one more function into the functions section with high boost (like 1000) for missing locations, something like this:
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "locations"
}
}
},
},
"weight": 1000
}
So records with missing locations will come first because of high weight.
Syntax can differ a little. More information on queries here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Elastic search how can I query either multi match or functions

I have three following parameters that I will pass to run the query, which are;
query - Either a place name, description or empty,
lat - Either latitude of a place or empty,
lon - Either longitude of a place or empty
Based on above parameters, I get to query list of items based on query scores, then calculate the distance between result and lat, lon.
Now, I have the following script to get the items based on query and distance;
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [{
"multi_match" : {
"query": "Lippo",
"fields": [ "name^6", "city^5", "country^4", "position^3", "address_line^2", "description"]
}
}]
}
},
"functions": [
{
"gauss": {
"position": {
"origin": "-6.184652, 106.7518749",
"offset": "2km",
"scale": "10km",
"decay": 0.33
}
}
}
]
}
}
}
But the thing is, if query is empty, there will be no result at all. What I want is, the result is based on either query or distance.
Is there anyway to achieve this? Any suggestion is appreciated.
setting the zero_terms_query option of multi-match to all should allow you to get the results when query is empty.
Example :
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [{
"multi_match" : {
"query": "Lippo",
"fields": [ "name^6", "city^5", "country^4", "position^3", "address_line^2", "description"],
"zero_terms_query" : "all"
}
}]
}
},
"functions": [
{
"gauss": {
"position": {
"origin": "-6.184652, 106.7518749",
"offset": "2km",
"scale": "10km",
"decay": 0.33
}
}
}
]
}
}
}

ElasticSearch function_score query with filters

Trying to create a search that will bring back results of location that are about 500m away from certain geo point.
I need to filter results from this search based on if location is empty in the source, or not.
I tried things like this:
"filtered" : {
"filter": {
"missing" : { "field" : "location" }
}
}
This is the search JSON I got:
{"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {"fieldA": "value"}
},
{
"match": { "fieldB": "value"}
}
]
}
},
"functions": [{
"gauss": {
"location": {
"origin": "'.$lat.','.$lon.'",
"scale": "500m",
"offset": "0km",
"decay": 0.33
}
}
}]
}
}
}
I tried putting the filter in different places in the query but it didn't work for me, so I know I'm doing something fundamentally wrong with how the query is structured. In the future I want to add more scoring logic and other filters, but can't find a good example of such queries.
What I should do to make it work?
You can use filtered query with geo_distance to filter out the results first. You can also use "weight" in function_score to explicit boost the distance over the "match" queries:
{
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"fieldA": "value"
}
},
{
"match": {
"fieldB": "value"
}
}
]
}
},
"filter": {
"geo_distance" : {
"distance" : "500m",
"location" : "'.$lat.','.$lon.'"
}
}
}
},
"functions": [
{
"gauss": {
"location": {
"origin": "'.$lat.','.$lon.'",
"scale": "500m",
"offset": "0km",
"decay": 0.33
}
},
"weight": "3"
}
]
}
}
}

Resources