how to get document above average value in elasticsearch - elasticsearch

After grouping by device ID, I want to find the number of data with a numerical value greater than or equal to the average value of the data.
doc = [
{ deviceId: 1, data: {temp:1} },
{ deviceId: 1, data: {temp:2} },
{ deviceId: 1, data: {temp:3} },
{ deviceId: 1, data: {temp:4} },
{ deviceId: 1, data: {temp:5} },
{ deviceId: 2, data: {temp:1} },
{ deviceId: 2, data: {temp:2} },
{ deviceId: 2, data: {temp:3} },
{ deviceId: 2, data: {temp:4} },
{ deviceId: 2, data: {temp:5} },
{ deviceId: 3, data: {temp:1} },
{ deviceId: 3, data: {temp:2} },
{ deviceId: 3, data: {temp:3} },
{ deviceId: 3, data: {temp:4} },
{ deviceId: 3, data: {temp:5} },
];
"The desired result is"
result = aggregations :{
clusters:{
...
bucket:[
{ key:"1",
doc_count: 5,
avgData: {value: 3.0}
above_avgDataValue : {
doc_count : 2 // === data.temp > 3
}
}
]
}
}
Below is the aggregation I tried
_search {
size:0,
query:{
bool:{
filter:[
terms:{deviceId:[1,2]}
]
}
},
aggs:{
cluster:{
terms:{field:deviceId}
},
aggs:{
"avgData" : {"avg": {"field":"temp"}}
}
}
};
please help

Tldr;
I don't think this is possible with a single query.
But you could work around the issue by:
Get the average per deviceId
Get the number of doc above the average of deviceId
Work around
To get the average per deviceId the following query should work.
GET /73034730/_search
{
"size": 0,
"aggs": {
"avg_per_fields": {
"terms": {
"field": "deviceId",
"size": 10
},
"aggs": {
"avg": {
"avg": {
"field": "data.temp"
}
}
}
}
}
}
Then you could do the following query
GET /73034730/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"deviceId": "1"
}
},
{
"range": {
"data.temp": {
"gte": 3
}
}
}
]
}
},
{
"bool": {
"must": [
{
"match": {
"deviceId": "2"
}
},
{
"range": {
"data.temp": {
"gte": 3.33333335
}
}
}
]
}
},
{
"bool": {
"must": [
{
"match": {
"deviceId": "3"
}
},
{
"range": {
"data.temp": {
"gte": 3
}
}
}
]
}
}
],
"minimum_should_match": 1
}
},
"size": 0,
"aggs": {
"avg_per_fields": {
"terms": {
"field": "deviceId",
"size": 10
}
}
}
}
which considering the dataset you have should give you
{
...
"aggregations": {
"avg_per_fields": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 3
},
{
"key": 2,
"doc_count": 3
},
{
"key": 3,
"doc_count": 3
}
]
}
}
}

Related

How to return 0 for requested data in aggregation if no documents matched

I have an index of users with structure:
User
book_ids:[] //array of book ids
books : [{
book_id:
name:
}] //array of books
I want to create a query that returns a map of Book Id and number of users that read it.
The result of the query should include books that are not used by any user.
I have a very simplified version of the query:
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"books",
"query": {
"bool": {
"must": {
"terms": {
"books.book_id": [100,200] //book ids that provided as a parameter
}
}
}
}
}
}
]
}
},
"aggs":{
"books":{
"terms":{
"field":"book_ids",
"include":[100,200] //book ids that provided as a parameter
}
}
},
"size":0
}
The result of the query will be
buckets: [
{key: 100, doc_count: 53}
]
So there are 53 users who read the book with id 100, but there is no user who reads book with id 200(as we don't have it in response).
The question here is how can I change the query to get a following result:
buckets: [
{key: 100, doc_count: 53},
{key: 200, doc_count: 0}
]
Terms aggregations doesn't add the bucket in the result if a given term does not exist in the index.
You can use filters aggregation for this purpose:
{
"query": {
...
},
"aggs": {
"books": {
"filters": {
"filters": {
"100": { "match": { "book_ids": 100 } },
"200": { "match": { "book_ids": 200 } }
}
}
}
},
"size": 0
}
To reproduce
# post some books ids, with the 5 missing
POST /_bulk
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [1, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [4, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [6, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [7, 2, 3] }
GET /72201832/_search
{
"size": 0,
"aggs": {
"books": {
"filters": {
"filters": {
"1": { "term": {"book_ids": "1"} },
"2": { "term": {"book_ids": "2"} },
"3": { "term": {"book_ids": "3"} },
"4": { "term": {"book_ids": "4"} },
"5": { "term": {"book_ids": "5"} },
"6": { "term": {"book_ids": "6"} },
"7": { "term": {"book_ids": "7"} }
}
}
}
}
}

Elasticsearch query syntax update 1.0.14 to 7.5.2

I am attempting to update a rather sizable Elasticsearch query. I am very new to Elasticsearch and am having a hard time wrapping my head around everything that is happening here.
This is the original query:
elasticsearch_query search_models, {
query: {
filtered: {
query: {
function_score: {
query: {
bool: {
must: [
{
multi_match: {
operator: "and",
type: "cross_fields",
query: params[:term],
fuzziness: (params[:fuzzy] || 0),
fields: [
"name^2", "address", "email", "email2",
"primary_contact", "id", "lotname^7",
"lotname_keyword^9", "corp_name^4",
"vin^4", "serial_number^3", "dba_names",
"title_number", "title_tracking",
"full_name^3", "username", "reference", "user_name",
"phone", "text_number", "name_keyword^9"
],
},
}
],
must_not: [
{ term: { access: { value: 2 } } },
{ term: { ledger: { value: "payroll" } } },
{ term: { ledger: { value: "credit card" } } }
],
should: [
{ term: { active: { value: 1, boost: 100 } } },
{ term: { active: { value: 2, boost: 50 } } },
{ term: { active: { value: 3, boost: 0.05 } } },
{ term: { active: { value: 4, boost: 50 } } },
{ term: { active: { value: 5, boost: 50 } } },
{ term: { active: { value: 6, boost: 50 } } },
{ term: { branch_id: { value: current_branch.id, boost: 100 } } }
]
}
},
functions: [
{
filter: { term: { auto_declined: 1 } },
boost_factor: 0.3
},
{
filter: { term: { auto_declined: 0 } },
boost_factor: 0.0001
},
{
filter: { term: { access: 1 } }, # current employee
boost_factor: 10
},
{
filter: { term: { access: 0 } },
boost_factor: 0.2
},
{
filter: { term: { unit_status: 1 } }, # current unit
boost_factor: 2
},
{
filter: { type: {value: 'txn'} },
boost_factor: 0.4
}
]
}
},
filter: {
"or" =>
{ filters: [
{ term: { "branch_id" => current_branch.id }},
{ type: { "value" => "auction" }},
{ type: { "value" => "fee_schedule"}},
{ type: { "value" => "unit"}},
{
"and" => [
{ type: { "value" => "user" }},
{ "or" => [
{ term: { "access" => 1 }}
]}
]
}
]}
}
}
}
}
This is where I have gotten to thus far:
elasticsearch_query search_models, {
query: {
bool: {
must: {
function_score: {
query: {
bool: [
must: {
multi_match: {
query: params[:term],
type: "cross_fields",
operator: "and",
fields: [
"name^2", "address", "email", "email2",
"primary_contact", "id", "lotname^7",
"lotname_keyword^9", "corp_name^4",
"vin^4", "serial_number^3", "dba_names",
"title_number", "title_tracking",
"full_name^3", "username", "reference", "user_name",
"phone", "text_number", "name_keyword^9"
]
}
},
must_not: [
{ term: { access: { value: 2 } } },
{ term: { ledger: { value: "payroll" } } },
{ term: { ledger: { value: "credit card" } } }
],
should: [
{ term: { active: { value: 1, boost: 100 } } },
{ term: { active: { value: 2, boost: 50 } } },
{ term: { active: { value: 3, boost: 0.05 } } },
{ term: { active: { value: 4, boost: 50 } } },
{ term: { active: { value: 5, boost: 50 } } },
{ term: { active: { value: 6, boost: 50 } } },
{ term: { branch_id: { value: current_branch.id, boost: 100 } } }
]
]
},
functions: [
{
filter: { term: { auto_declined: 1 } },
weight: 0.3
},
{
filter: { term: { auto_declined: 0 } },
weight: 0.0001
},
{
filter: { term: { access: 1 } }, # current employee
weight: 10
},
{
filter: { term: { access: 0 } },
weight: 0.2
},
{
filter: { term: { unit_status: 1 } }, # current unit
weight: 2
},
{
filter: { type: {value: 'txn'} },
weight: 0.4
}
]
}
},
filter: [
{ term: { branch_id: current_branch.id } },
{ type: { value: "auction" } },
{ type: { value: "fee_schedule"} },
{ type: { value: "unit"} },
bool: {
must: {
bool: {
should: [
{ type: { value: "user" } },
{ term: { access: 1 } }
]
}
}
}
]
}
}
}
I have:
replaced 'filtered' with 'bool' and 'must'
replaced 'boost_factor' with 'weight'
removed 'fuzziness' from the 'cross_fields' type 'multi_match'
attempted to update the 'or' and 'and' logic with newer 'bool' syntax.
The first three actions seemed to have done the trick with their respective errors, but I am getting hung up on this filter with the 'or' and 'and' logic. I would greatly appreciate some guidance!
This is the error I am receiving:
[400] {"error":{"root_cause":[{"type":"parsing_exception","reason":"[bool] query malformed, no start_object after query name","line":1,"col":61}],"type":"parsing_exception","reason":"[bool] query malformed, no start_object after query name","line":1,"col":61},"status":400}
If any further information would be helpful, please let me know.
Your query is forming invalid Json, that why the exception, see below query with no Json error.
{
"query": {
"bool": {
"must": {
"function_score": {
"query": {
"bool": {
"must": [{
"multi_match": {
"query": "params[:term]",
"type": "cross_fields",
"operator": "and",
"fields": [
"name^2", "address", "email", "email2",
"primary_contact", "id", "lotname^7",
"lotname_keyword^9", "corp_name^4",
"vin^4", "serial_number^3", "dba_names",
"title_number", "title_tracking",
"full_name^3", "username", "reference", "user_name",
"phone", "text_number", "name_keyword^9"
]
}
}],
"must_not": [
{ "term": { "access": { "value": 2 } } },
{ "term": { "ledger": { "value": "payroll" } } },
{ "term": { "ledger": { "value": "credit card" } } }
],
"should": [
{ "term": { "active": { "value": 1, "boost": 100 } } }
]
}
},
"functions": [
{
"filter": { "term": { "auto_declined": 1 } },
"weight": 0.3
},
{
"filter": { "term": { "auto_declined": 0 } },
"weight": 0.0001
},
{
"filter": { "term": { "access": 1 } },
"weight": 10
},
{
"filter": { "term": { "access": 0 } },
"weight": 0.2
},
{
"filter": { "term": { "unit_status": 1 } },
"weight": 2
},
{
"filter": { "type": {"value": "txn"} },
"weight": 0.4
}
]
}
},
"filter": [
{ "term": { "branch_id": "current_branch.id" } },
{ "type": { "value": "auction" } },
{ "type": { "value": "fee_schedule"} },
{ "type": { "value": "unit"} },
{"bool": {
"must": {
"bool": {
"should": [
{ "type": { "value": "user" } },
{ "term": { "access": 1 } }
]
}
}
}
}
]
}
}
}
You can compare and modify the curly braces and brackets.

ElasticSearch: Complex filter by nested document

I have following document structure:
{
product_name: "Product1",
product_id: 1,
...,
articles: [
{
article_name: 'Article 101',
id: 101,
some_param: 10,
clients: []
},
{
article_name: 'Article 102',
id: 102,
some_param: 11,
clients: [
{
client_id: 10001,
client_name: "some client 1001"
}
...
]
}
]
},
{
product_name: "Product2",
product_id: 2,
...,
articles: [
{
article_name: 'Article 101',
id: 101,
some_param: 10,
clients: []
},
{
article_name: 'Article 102',
id: 102,
some_param: 10,
clients: [
{
client_id: 10001,
client_name: "some client 1001"
}
...
]
}
]
}
I need to get documents (product) ONLY if some of its articles match 2 conditions (single article should match both conditions): articles.some_param = 10 AND articles.clients.client_id = 10001
So I need to get only product with id 2.
I'm using this query now, which is incorrect (and I know why), because it fetches both documents:
{
"query": {
"bool": {
"filter": [
{
"term": {
"articles.clients.id": 10001
}
},
{
"terms": {
"articles.some_param": 10
}
}
]
}
}
}
How can I write query which gets only products which has at least 1 article which matches both conditions: articles.some_param = 10 AND articles.clients.client_id = 10001
e.g., to get Product with ID 2 only?
Something like this:
{
"query": {
"nested": {
"path": "articles",
"query": {
"bool": {
"must": [
{
"term": {
"articles.some_param": {
"value": 10
}
}
},
{
"nested": {
"path": "articles.clients",
"query": {
"term": {
"articles.clients.id":{
"value": 10001
}
}
}
}
}
]
}
}
}
}
}
UPDATE:
Try wrap second query to bool.
{
"query": {
"nested": {
"path": "articles",
"query": {
"bool": {
"must": [
{
"term": {
"articles.some_param": {
"value": 10
}
}
},
{
"bool":{
"must" : [
{
"nested": {
"path": "articles.clients",
"query": {
"term": {
"articles.clients.id":{
"value": 10001
}
}
}
}
}
]
}
}
]
}
}
}
}
}
p.s. I could be mistaken with a path on the second nested query. Just couldn't check. So you can play around with the path on the second query.
p.p.s. The filter is not the query what you need. It does not calculate the scores

how to group by terms count in elasticsearch

I have the following document mapping
{
properties: {
id: {
type: 'keyword'
},
rel: {
type: 'nested',
properties: {
type: {
type: 'keyword'
},
...
}
}
}
}
In the end I want to plot a x-y chart where x axis is count of type t1 and y axis is count of type t2, so for the following documents
{ id: 1, rel: [ { type: t1, ... }, { type: t1, ... }, { type: t2, ... }] }
{ id: 2, rel: [ { type: t1, ... }, { type: t1, ... }] }
{ id: 3, rel: [ { type: t1, ... }, { type: t1, ... }] }
will map to 3 (x, y) points (2, 1), (2, 0), (2, 0), and I'm going to plot them on x-y plane like this
^
|
| 1
+---2-->
Right now I use the following aggregation
{
"_source": false,
"aggregations": {
"g1": {
"terms": {
"field": "id",
"size": 10000
},
"aggregations": {
"rel": {
"nested": {
"path": "rel"
},
"aggregations": {
"filter-t1": {
"filter": {
"terms": {
"rel.type": [
"t1"
]
}
}
},
"filter-t2": {
"filter": {
"terms": {
"rel.type": [
"t2"
]
}
}
}
}
}
}
}
}
}
to get the following result
{
"aggregations": {
"g1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"rel": {
"doc_count": 4942,
"filter-t1": {
"doc_count": 6
},
"filter-t2": {
"doc_count": 20
}
}
},
{
"key": "2",
"doc_count": 1,
"rel": {
"doc_count": 3039,
"filter-t1": {
"doc_count": 6
},
"filter-t2": {
"doc_count": 11
}
}
}
...
and calculate number of documents in each coordinate in API layer.
The problem is that number of total documents is at millions of documents level, querying all in a request doesn't work. I don't find a way to do pagination in aggregations either, from size seems only work for _source.
Is there a way to achieve what I want in elasticsearch?

Ratio with elasticsearch

I have a list of customers with this structure:
{
"name" : "Toya Romano",
"hungry" : false,
"date" : 1420090500020
}
I would like to get the ratio of people who are hungry. How can I do it with an ElasticSearch query? I am running ES 2.3.
Rather a hacky approach because of this issue, but this should work:
{
"size": 0,
"aggs": {
"whatever": {
"filters": {
"filters": [{}]
},
"aggs": {
"all_people": {
"filter": {}
},
"hungry_count": {
"filter": {
"term": {
"hungry": true
}
}
},
"hungry_ratio": {
"bucket_script": {
"buckets_path": {
"total_hungry": "hungry_count._count",
"all": "all_people._count"
},
"script": "total_hungry/all"
}
}
}
}
}
}
With the result like this:
"buckets": [
{
"doc_count": 5,
"all_people": {
"doc_count": 5
},
"hungry_count": {
"doc_count": 3
},
"hungry_ratio": {
"value": 0.6
}
}
]

Resources