I am working to build a ES query that satisfies the condition >= avg .
Here is an example:
GET /_search
{
"size" : 0,
"query" : {
"filtered": {
"filter": {
"range": {
"price": {
"gte": {
"aggs" : {
"single_avg_price": {
"avg" :{
"field" : "price"
}
}
}
}
}
}
}
}
}
}
I get the following error
"type": "query_parsing_exception",
"reason": "[range] query does not support [aggs]",
I wonder how do we use aggregated value with range query in Elastic query
You cannot embed aggregations inside a query. You need to first send an aggregation query to find out the average and then send a second range query using the obtained average value.
Query 1:
POST /_search
{
"size": 0,
"aggs": {
"single_avg_price": {
"avg": {
"field": "price"
}
}
}
}
Then you get the average price, say it was 12.3 and use it in your second query, like this:
Query 2:
POST /_search
{
"size": 10,
"query": {
"filtered": {
"filter": {
"range": {
"price": {
"gte": 12.3
}
}
}
}
}
}
After I tried using different ES aggregations such as bucket selector , I found that it can be done using python.
Here is the python code I created to solve this issue.
Please note: URL , USER_NAME , PASSWORD needs to be filled before run it.
#! /usr/bin/python
import sys,json,requests
from requests.auth import HTTPBasicAuth
# static variables
URL=''
USER_NAME=''
PASSWORD=''
# returns avg value
def getAvg():
query = json.dumps({
"aggs": {
"single_avg_price": {
"avg": {
"field": "price"
}
}
}
})
response = requests.get(URL,auth=HTTPBasicAuth(USER_NAME,PASSWORD), data=query)
results = json.loads(response.text)
return results['aggregations']['single_avg_price']['value']
#returns rows that are greater than avg value
def rows_greater_than_avg(avg_value):
query = json.dumps({
"query" : {
"range": {
"price": {
"gte":avg_value
}
}
}
})
response = requests.get(URL,auth=HTTPBasicAuth(USER_NAME,PASSWORD), data=query)
results = json.loads(response.text)
return results
# main method
def main():
avg_value = getAvg()
print( rows_greater_than_avg(avg_value))
main()
Related
How would the following query look:
Scenario:
I have two bases (base 1 and 2), with 1 column each, I would like to see the difference between them, that is, what exists in base 1 that does not exist in base 2, considering the fictitious names of the columns as hostname.
Example:
Selected value of Base1.Hostname is for Base2.Hostname?
YES → DO NOT RETURN
NO → RETURN
I have this in python for the following function:
def diff(first, second):
second = set (second)
return [item for item in first if item not in second]
Example match equal:
GET /base1/_search
{
"query": {
"multi_match": {
"query": "webserver",
"fields": [
"hostname"
],
"type": "phrase"
}
}
}
I would like to migrate this architecture to elastic search in order to generate forecast in the future with the frequency of change of these search in the bases
This could be done with aggregation.
Collect all the hostname from base1 & base2 index
For each hostname count occurrences in base2
Keep only the buckets that have base2 count 0
GET base*/_search
{
"size": 0,
"aggs": {
"all": {
"composite": {
"size": 10,
"sources": [
{
"host": {
"terms": {
"field": "hostname"
}
}
}
]
},
"aggs": {
"base2": {
"filter": {
"match": {
"_index": "base2"
}
}
},
"index_count_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"base2_count": "base2._count"
},
"script": "params.base2_count == 0"
}
}
}
}
}
}
By the way don't forget to use pagination to get rest of the result.
References :
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
https://discuss.elastic.co/t/data-set-difference-between-fields-on-different-indexes/160015/4
I'm looking for a way to make a query that would return a score.
This score would be the numerical distance between 2 numbers.
Is there a way to do this in elasticsearch ?
for example if my data looks like this:
{
value: INT
}
in my query I want a default parameters (an other INT)
and as result I want my data sorted with the score ( |object.value - default.value| )
Using a function_score query (with script_score), you can achieve what you need:
GET /_search
{
"query": {
"function_score": {
"query": {
"match_all": { }
},
"script_score" : {
"script" : {
"source": "Math.abs(doc['value'].value - params.default)",
"params": {
"default": 10
}
}
}
}
}
}
If i understood correctly,you want something like this? It searches in the field range and returns all hits with score value between 0 and 6.
query = {
'query': {
'range': {
'score': {
'lte': 6,
'gte': 0
}
}
} }
I have documents like this:
{
body: 'some text',
read_date: '2017-12-22T10:19:40.223000'
}
Is there a way to query count of documents published in last 10 days group by date? For example:
2017-12-22, 150
2017-12-21, 79
2017-12-20, 111
2017-12-19, 27
2017-12-18, 100
Yes, you can easily achieve that using a date_histogram aggregation, like this:
{
"query": {
"range": {
"read_date": {
"gte": "now-10d"
}
}
},
"aggs": {
"byday": {
"date_histogram": {
"field": "read_date",
"interval": "day"
}
}
}
}
To receive day count of the past 10 days, per day you can POST the following query:
{
"query": {
"range": {
"read_date": {
"gte": "now-11d/d",
"lte": "now-1d/d"
}
}
},
"aggs" : {
"byDay" : {
"date_histogram" : {
"field" : "read_date",
"calendar_interval" : "1d",
"format" : "yyyy-MM-dd"
}
}
}
}
To the following Url: http://localhost:9200/Index_Name/Index_Type/_search?size=0
Setting size to 0 avoids executing the fetch phase of the search making the request more efficient. See this elastic documentation for more information.
Per our requirement we need to find the max ID of the document before adding new document. Problem here is doc may contain string data also So had to use inline script on the elastic query to find out max id only for the document which has integer data otherwise returning 0. am using following inline script query to find max-key but not working. can you help me onthis ?.
{
"size":0,
"query":
{"bool":
{"filter":[
{"term":
{"Name":
{
"value":"Test2"
}
}}
]
}},
"aggs":{
"MaxId":{
"max":{
"field":"Key","script":{
"inline":"((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"}}
}
}
}
The error is because the max aggregation only supports numeric fields, i.e. you cannot specify a string field (i.e. Key) in a max aggregation.
Simply remove the "field":"Key" part and only keep the script part
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"Name": "Test2"
}
}
]
}
},
"aggs": {
"MaxId": {
"max": {
"script": {
"source": "((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"
}
}
}
}
}
Hello I'm having trouble deciding the correctness of the following query for multiple OR in Elasticsearch. I want to select all the unique data (not count, but select all rows)
My best try for this in elastic query is
GET mystash/_search
{
"aggs": {
"uniques":{
"filter":
{
"or":
[
{ "term": { "url.raw" : "/a.json" } },
{ "term": { "url.raw" : "/b.json" } },
{ "term": { "url.raw" : "/c.json"} },
{ "term": { "url.raw" : "/d.json"} }
]
},
"aggs": {
"unique" :{
"terms" :{
"field" : "id.raw",
"size" : 0
}
}
}
}
}
}
The equivalent SQL would be
SELECT DISTINCT id
FROM json_record
WHERE
json_record.url = 'a.json' OR
json_record.url = 'b.json' OR
json_record.url = 'c.json' OR
json_record.url = 'd.json'
I was wondering whether the query above is correct, since the data will be needed for report generations.
Some remarks:
You should use a query filter instead of an aggregation filter. Your query loads all documents.
You can replace your or+term filter by a single terms filter
You could use a size=0 at the root of the query to get only agg result and not search results
Example code:
{"size":0,
"query" :{"filtered":{"filter":{"terms":{"url":["a", "b", "c"]}}}},
"aggs" :{"unique":{"term":{"field":"id", "size" :0}}}
}