ElasticSearch Numeric Distance - elasticsearch

I'm looking for a way to make a query that would return a score.
This score would be the numerical distance between 2 numbers.
Is there a way to do this in elasticsearch ?
for example if my data looks like this:
{
value: INT
}
in my query I want a default parameters (an other INT)
and as result I want my data sorted with the score ( |object.value - default.value| )

Using a function_score query (with script_score), you can achieve what you need:
GET /_search
{
"query": {
"function_score": {
"query": {
"match_all": { }
},
"script_score" : {
"script" : {
"source": "Math.abs(doc['value'].value - params.default)",
"params": {
"default": 10
}
}
}
}
}
}

If i understood correctly,you want something like this? It searches in the field range and returns all hits with score value between 0 and 6.
query = {
'query': {
'range': {
'score': {
'lte': 6,
'gte': 0
}
}
} }

Related

is there a way to query range when maximal range is defined by an array with two numbers

I need to write an elastic range query that operates on a following index format
...
"facetProperties": {
"fid641616": [
31.75,
44.45
]
}
...
the following query works only if lt or gt matches the lower or the upper bound of the max range. As soon as I try to narrow both ends, there are no results.
{
"query": {
"bool": {
"should": [{
"range": {
"facetProperties.fid641616": {
"gt": 33,
"lt": 42
}
}
}]
}
},
"from": 0,
"size": 250,
"sort": [
],
"aggs": {
},
"_source": "facetProperties.fid641616"
}
Is there a way to get this working without modifying the index?
update1 - some use cases:
query range:
"range": {
"facetProperties.fid641616": {
"gt": 33,
"lt": 42
}
}
facet1 : [31] - should not be found
facet2 : [31,45] - should be found
facet1 : [31,32] - should not be found
facet1 : [44,45] - should not be found
Basically it is not possible to query based on the range or difference of two numbers in an array using conventional DSL queries in ES but you can do that using script.
Below is the document and sample script that should help you.
Sample Document:
POST range_index/_doc/1
{
"array": [31.75, 44.45]
}
Query:
POST range_index/_search
{
"query": {
"script": {
"script": {
"source": """
List list = doc['array'];
if(list.size()==2){
long first_number = list.get(0);
long last_number = list.get(1);
if(params.gt < first_number)
return false;
if(params.lt > last_number)
return false;
if((last_number - first_number) >= (params.lt - params.gt))
return true;
}
return false;
""",
"params": {
"gt": 33,
"lt": 42
}
}
}
}
}
What I've done is simply created a script that would return you documents having the difference of gt and lt that you have mentioned in your query.
You should be able to view the document I've mentioned in the result. Note that I'm assuming that the field array would be in asc order.
Basically it would return all the documents having difference of 42-33 i.e. 9.
Let me know if that helps!

Elasticsearch filter multiple terms with only matching results and not any of them

How I can get only filtered matching results with all the multi term search. I have this sample table where titleid is a mapping int field and personid is a keyword:
titleid:1,personid:a
titleid:3,personid:a
titleid:1,personid:b
titleid:2,personid:b
titleid:1,personid:c
titleid:5,personid:c
The expeted result is:
titleid:1
With a sample query like this one:
{query:
{bool:
{filter:
{must:[
{terms : {fields: {personid:[a,b,c]}}
]
}}}}
I have the following results:
titleid: 1,2,3,5
Maybe this will help, I did the query in sql and got the expected result. What I did was ask the query to give me the sum of titleid that matches the quantity of searched parameters. This is only to be more self explained, the idea is to use elasticsearch.
select titleid
from (
select count(titleid) as title_count, titleid
from table1
where personid in ('a','b','c')
group by titleid
) as vw
where title_count = 3
if you only want records with titleid == 1 AND personid == 'a' you can filter on both fields. only the boolean query uses must, should, and most_not. with a filter since it's filtering (eg, removing) by definition it's a must
"query": {
"bool": {
"filter": [
{
"term": {
"titleId": { "value": 1 }
}
},
{
"term": {
"personid": { "value": "a" }
}
}
]
}
}
UPDATE::
Now your question looks like you want to filter and aggregate your results and then aggregate on those. There's a few metrics and bucket aggregations
Using bucket selector aggregation (this isn't tested but should be very close if not correct)
{
"aggs" : {
"title_id" : {
"filter" : { "terms": { "personid": ["a","b","c"] } },
"aggs" : {
"id_count" : { "count" : { "field" : "titleid" } }
}
},
aggs": {
"count_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "the_doc_count == 3"
}
}
}
}
}
However, be aware that Pipeline aggregations work on the outputs produced from other aggregations, so the overall amount of work that needs to be done to calculate the initial doc_counts will be the same. Since the script parts needs to be executed for each input bucket, the opetation might potentially be slow for high cardinality fields as in thousands of thousands of terms.

Elastic search aggregation with range query

I am working to build a ES query that satisfies the condition >= avg .
Here is an example:
GET /_search
{
"size" : 0,
"query" : {
"filtered": {
"filter": {
"range": {
"price": {
"gte": {
"aggs" : {
"single_avg_price": {
"avg" :{
"field" : "price"
}
}
}
}
}
}
}
}
}
}
I get the following error
"type": "query_parsing_exception",
"reason": "[range] query does not support [aggs]",
I wonder how do we use aggregated value with range query in Elastic query
You cannot embed aggregations inside a query. You need to first send an aggregation query to find out the average and then send a second range query using the obtained average value.
Query 1:
POST /_search
{
"size": 0,
"aggs": {
"single_avg_price": {
"avg": {
"field": "price"
}
}
}
}
Then you get the average price, say it was 12.3 and use it in your second query, like this:
Query 2:
POST /_search
{
"size": 10,
"query": {
"filtered": {
"filter": {
"range": {
"price": {
"gte": 12.3
}
}
}
}
}
}
After I tried using different ES aggregations such as bucket selector , I found that it can be done using python.
Here is the python code I created to solve this issue.
Please note: URL , USER_NAME , PASSWORD needs to be filled before run it.
#! /usr/bin/python
import sys,json,requests
from requests.auth import HTTPBasicAuth
# static variables
URL=''
USER_NAME=''
PASSWORD=''
# returns avg value
def getAvg():
query = json.dumps({
"aggs": {
"single_avg_price": {
"avg": {
"field": "price"
}
}
}
})
response = requests.get(URL,auth=HTTPBasicAuth(USER_NAME,PASSWORD), data=query)
results = json.loads(response.text)
return results['aggregations']['single_avg_price']['value']
#returns rows that are greater than avg value
def rows_greater_than_avg(avg_value):
query = json.dumps({
"query" : {
"range": {
"price": {
"gte":avg_value
}
}
}
})
response = requests.get(URL,auth=HTTPBasicAuth(USER_NAME,PASSWORD), data=query)
results = json.loads(response.text)
return results
# main method
def main():
avg_value = getAvg()
print( rows_greater_than_avg(avg_value))
main()

ElasticSearch max score

I'm trying to solve a performance issue we have when querying ElasticSearch for several thousand results. The basic idea is that we do some post-query processing and only show the Top X results ( Query may have ~100000 Results while we only need the top 100 according to our Score Mechanics ).
The basic mechanics are as follows:
ElasticSearch Score is normalized between 0..1 ( score/max(score) ), we add our ranking score ( also normalized between 0..1 ) and divide by 2.
What I'd like to do is move this logic into ElasticSearch using custom scoring ( or well, anything that works ): https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-script-score
The Problem I'm facing is that using Score Scripts / Score Functions I can't seem to find a way to do something like max(_score) to normalize the score between 0 and 1.
"script_score" : {
"script" : "(_score / max(_score) + doc['some_normalized_field'].value)/2"
}
Any ideas are welcome.
You can not get max_score before you have actually generated the _score for all the matching documents. script_score query will first generate the _score for all the matching documents and then max_score will be displayed by elasticsearch.
According to what i can understand from your problem, You want to preserve the max_score that was generated by the original query, before you applied "script_score". You can get the required result if you do some computation at the front-end. In short apply your formula at the front end and then sort the results.
you can save your factor inside your results using script_fields query.
{
"explain": true,
"query": {
"match_all": {}
},
"script_fields": {
"total_goals": {
"script": {
"lang": "painless",
"source": """
int total = 0;
for (int i = 0; i < doc['goals'].length; ++i) {
total += doc['goals'][i];
}
return total;
""",
"params":{
"last" : "any parameters required"
}
}
}
}
}
I am not sure that I understand your question. do you want to limit the amount of results?
are you tried?
{
"from" : 0, "size" : 10,
"query" : {
"term" : { "name" : "dennis" }
}
}
you can use sort to define sort order by default it will sorted by main query.
you can also use aggregations ( with or without function_score )
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"date": {
"scale": "3d",
"offset": "7d",
"decay": 0.1
}
}
},
{
"gauss": {
"priority": {
"origin": "0",
"scale": "100"
}
}
}
],
"query": {
"match" : { "body" : "dennis" }
}
}
},
"aggs": {
"hits": {
"top_hits": {
"size": 10
}
}
}
}
Based on this github ticket it is simply impossible to normalize score and they suggest to use boolean similarity as a workaround.

In Elasticsearch how to use multiple term filters when number of terms are not fixed they can vary?

I know for using multiple term filters one should use bools but the problem here is that i dont know how many terms there gonna be for example i want to filter results on strings with OR ("aa", "bb", "cc", "dd", "ee") now i want my searches that will contain any of the strings but the problem is that sometimes this array size will be 15 or 10 or 20 now how can i handle number of terms in filters my code is given below.
var stores = docs.stores; // **THIS IS MY ARRAY OF STRINGS**
client.search({
index: 'merchants',
type: shop_type,
body: {
query: {
filtered: {
filter: {
bool: {
must: [
{
// term: { 'jeb_no': stores }, // HERE HOW TO FILTER ALL ARRAY STRINGS WITH OR CONDITION
}
]
}
}
}
}, script_fields : {
"area": {
"script" : "doc['address.area2']+doc['address.area1']"
}
}
}
})
I think this will do. Use terms instead of term
{
"query": {
"bool": {
"must": [
{
"terms": {
"jeb_no": stores
}
}
]
}
}
}

Resources