Get an aggregate count in elasticsearch based on particular uniqueid field

Get an aggregate count in elasticsearch based on particular uniqueid field - elasticsearch

I have created an index and indexed the document in elasticsearch it's working fine but here the challenge is i have to get an aggregate count of category field based on uniqueid i have given my sample documents below.
{
"UserID":"A1001",
"Category":"initiated",
"policyno":"5221"
},
{
"UserID":"A1001",
"Category":"pending",
"policyno":"5222"
},
{
"UserID":"A1001",
"Category":"pending",
"policyno":"5223"
},
{
"UserID":"A1002",
"Category":"completed",
"policyno":"5224"
}
**Sample output for UserID - "A1001"**
initiated-1
pending-2
**Sample output for UserID - "A1002"**
completed-1
How to get the aggregate count from above given Json documents like the sample output mentioned above

I suggest a terms aggregation as shown in the following:
{
"size": 0,
"aggs": {
"By_ID": {
"terms": {
"field": "UserID.keyword"
},
"aggs": {
"By_Category": {
"terms": {
"field": "Category.keyword"
}
}
}
}
}
}
Here is a snippet of the response:
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"By_ID" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A1001",
"doc_count" : 3,
"By_Category" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "pending",
"doc_count" : 2
},
{
"key" : "initiated",
"doc_count" : 1
}
]
}
},
{
"key" : "A1002",
"doc_count" : 1,
"By_Category" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "completed",
"doc_count" : 1
}
]
}
}
]
}
}

Related

How to get word count in docs as a aggregate over time in elastic search?

I am trying to get word count trends in docs as aggregate result . Although using the following approach I am able to get the doc count aggregation result but I am not able to find any resources using which I can get word count for the month of jan , feb & mar
PUT test/_doc/1
{
"description" : "one two three four",
"month" : "jan"
}
PUT test/_doc/2
{
"description" : "one one test test test",
"month" : "feb"
}
PUT test/_doc/3
{
"description" : "one one one test",
"month" : "mar"
}
GET test/_search
{
"size": 0,
"query": {
"match": {
"description": {
"query": "one"
}
}
},
"aggs": {
"monthly_count": {
"terms": {
"field": "month.keyword"
}
}
}
}
OUTPUT
{
"took" : 706,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"monthly_count" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "feb",
"doc_count" : 1
},
{
"key" : "jan",
"doc_count" : 1
},
{
"key" : "mar",
"doc_count" : 1
}
]
}
}
}
EXPECTED WORD COUNT OVER MONTH
"aggregations" : {
"monthly_count" : {
"buckets" : [
{
"key" : "feb",
"word_count" : 2
},
{
"key" : "jan",
"word_count" : 1
},
{
"key" : "mar",
"word_count" : 3
}
]
}
}

Maybe this query can help you:
GET test/_search
{
"size": 0,
"aggs": {
"monthly_count": {
"terms": {
"field": "month.keyword"
},
"aggs": {
"count_word_one": {
"terms": {
"script": {
"source": """
def str = doc['description.keyword'].value;
def array = str.splitOnToken(' ');
int i = 0;
for (item in array) {
if(item == 'one'){
i++
}
}
return i;
"""
},
"size": 10
}
}
}
}
}
}
Response:
"aggregations" : {
"monthly_count" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "feb",
"doc_count" : 1,
"count_word_one" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "2",
"doc_count" : 1
}
]
}
},
{
"key" : "jan",
"doc_count" : 1,
"count_word_one" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 1
}
]
}
},
{
"key" : "mar",
"doc_count" : 1,
"count_word_one" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "3",
"doc_count" : 1
}
]
}
}
]
}
}

Aggregate by custom defined buckets, according to field value

I'm interested in aggregating my data into buckets, but I want to put two distinct values to the same bucket.
This is what I mean:
Say I have this query:
GET _search
{
"size": 0,
"aggs": {
"my-agg-name": {
"terms": {
"field": "ecs.version"
}
}
}
}
it returns this response:
"aggregations" : {
"my-agg-name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1.12.0",
"doc_count" : 642826144
},
{
"key" : "8.0.0",
"doc_count" : 204064845
},
{
"key" : "1.1.0",
"doc_count" : 16508253
},
{
"key" : "1.0.0",
"doc_count" : 9162928
},
{
"key" : "1.6.0",
"doc_count" : 1111542
},
{
"key" : "1.5.0",
"doc_count" : 10445
}
]
}
}
every distinct value of the field ecs.version is in it's own bucket.
But say I wanted to define my buckets such that:
bucket1: [1.12.0, 8.0.0]
bucket2: [1.6.0, 8.4.0]
bucket3: [1.0.0, 8.8.0]
Is this possible in anyway?
I know I can just return all the buckets and do the sum programmatically, but this list can be very long, I don't think it would be efficient. Am I wrong?

You can use Runtime Mapping to generat runtime field and that field will be use for aggregation. I have done below exmaple on ES 7.16.
I have index some of the sample document and below is aggregation output without join on multipul values:
"aggregations" : {
"version" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1.12.0",
"doc_count" : 3
},
{
"key" : "1.6.0",
"doc_count" : 3
},
{
"key" : "8.4.0",
"doc_count" : 3
},
{
"key" : "8.0.0",
"doc_count" : 2
}
]
}
}
You can use below query with runtime mapping but you need to add multipul if condition for your version mappings:
{
"size": 0,
"runtime_mappings": {
"normalized_version": {
"type": "keyword",
"script": """
String version = doc['version.keyword'].value;
if (version.equals('1.12.0') || version.equals('8.0.0')) {
emit('1.12.0, 8.0.0');
} else if (version.equals('1.6.0') || version.equals('8.4.0')){
emit('1.6.0, 8.4.0');
}else {
emit(version);
}
"""
}
},
"aggs": {
"genres": {
"terms": {
"field": "normalized_version"
}
}
}
}
Below is output of above aggregation query:
"aggregations" : {
"genres" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1.6.0, 8.4.0",
"doc_count" : 6
},
{
"key" : "1.12.0, 8.0.0",
"doc_count" : 5
}
]
}
}

Elasticsearch, sort aggs according to sibling fields but from different index

Elasticsearch v7.5
Hello and good day!
We have 2 indices named socialmedia and influencers
Sample contents:
socialmedia:
{
'_id' : 1001,
'title' : "Title 1",
'smp_id' : 1,
},
{
'_id' : 1002,
'title' : "Title 2",
'smp_id' : 2,
},
{
'_id' : 1003,
'title' : "Title 3",
'smp_id' : 3,
}
//omitted other documents
influencers
{
'_id' : 1,
'name' : "John",
'smp_id' : 1,
'smp_score' : 5
},
{
'_id' : 2,
'name' : "Peter",
'smp_id' : 2,
'smp_score' : 10
},
{
'_id' : 3,
'name' : "Mark",
'smp_id' : 3,
'smp_score' : 15
}
//omitted other documents
Now I have this simple query that determines which influencer has the most document in the socialmedia index
GET socialmedia/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"INFLUENCERS": {
"terms": {
"field": "smp_id.keyword"
//smp_id is a **text** based field, that's why we have `.keyword` here
}
}
}
}
SAMPLE OUTPUT:
"aggregations" : {
"INFLUENCERS" : {
"doc_count_error_upper_bound" : //omitted,
"sum_other_doc_count" : //omitted,
"buckets" : [
{
"key" : "1",
"doc_count" : 87258
},
{
"key" : "2",
"doc_count" : 36518
},
{
"key" : "3",
"doc_count" : 34838
},
]
}
}
OBJECTIVE:
My query is able to sort the influencers according to doc_count of their posts in the socialmedia index, now, is there a way for us to sort the INFLUENCERS aggregation or make a way to sort out the influencers according to their SMP_SCORE?
With that idea, smp_id 3 which is Mark, should be the first one to appear since he has an smp_score of 15
Thank you in advance for your help!

What you are looking for is a JOIN operation. Note that Elasticsearch doesn't support JOIN operations unless they are modelled in a way as mentioned in this link.
Instead, a very simplistic approach is to denormalize your data and add the smp_score to your socialmedia index as below:
Mapping:
PUT socialmedia
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
}
}
},
"smp_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"smp_score": {
"type": "float"
}
}
}
}
Your ES query would then have two Terms Aggregation as shown below:
Request Query:
POST socialmedia/_search
{
"size": 0,
"aggs": {
"influencers_score_agg": {
"terms": {
"field": "smp_score",
"order": { "_key": "desc" }
},
"aggs": {
"influencers_id_agg": {
"terms": {
"field": "smp_id.keyword"
}
}
}
}
}
}
Basically we are first aggregating on the smp_score and then introducing a sub-aggregation to display the smp_id.
Response:
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_influencers_score" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 15.0,
"doc_count" : 1,
"influencers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "3",
"doc_count" : 1
}
]
}
},
{
"key" : 10.0,
"doc_count" : 1,
"influencers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "2",
"doc_count" : 1
}
]
}
},
{
"key" : 5.0,
"doc_count" : 1,
"influencers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 1
}
]
}
}
]
}
}
}
Do spend sometime in reading the above link, however that would require you to model your index in a different way depending on the options mentioned in it. From what I understand, the solution I've provided would suffice.

About elasticsearch group by two fields and then filter or order

There is a shareholder index want to get below info
which holder invest the same company the most times
select hld_id, com_id, count(*) from shareholder group by hld_id, com_id order by count(*) desc;
which holder invest company just two times, maybe duplicate records
select hld_id, com_id from shareholder group by hld_id, com_id having count(*) = 2;
So how to implement above requirements by elasticsearch search query?

Below is the sample mapping, documents and aggregation query. I've figured three possible ways this can be done/achieved.
Mapping:
PUT shareholder
{
"mappings": {
"properties": {
"hld_id": {
"type": "keyword"
},
"com_id":{
"type": "keyword"
}
}
}
}
Documents:
POST shareholder/_doc/1
{
"hld_id": "001",
"com_id": "001"
}
POST shareholder/_doc/2
{
"hld_id": "001",
"com_id": "002"
}
POST shareholder/_doc/3
{
"hld_id": "002",
"com_id": "001"
}
POST shareholder/_doc/4
{
"hld_id": "002",
"com_id": "002"
}
POST shareholder/_doc/5
{
"hld_id": "002",
"com_id": "002" <--- Note I've changed this
}
Solution 1: Using Elasticsearch's aggregation
Aggregation Query: 1
Note that I've just made use of Terms Query pipelined firstly with hld_id and then with com_id
POST shareholder/_search
{
"size": 0,
"aggs": {
"share_hoder": {
"terms": {
"field": "hld_id"
},
"aggs": {
"com_aggs": {
"terms": {
"field": "com_id",
"order": {
"_count": "desc"
}
}
}
}
}
}
}
Below is how the response appear:
Response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"share_hoder" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002",
"doc_count" : 3,
"com_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002",
"doc_count" : 2 <---- Count you are looking for
},
{
"key" : "001",
"doc_count" : 1
}
]
}
},
{
"key" : "001",
"doc_count" : 2,
"com_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "001",
"doc_count" : 1
},
{
"key" : "002",
"doc_count" : 1
}
]
}
}
]
}
}
}
Of course you may not get the representation of the result exactly as you are looking for because of the way Elasticsearch's aggregation works.
Aggregation Query: 2
For this, most of it is same as aggregation_1, where I've used two Terms Query but I've additionally made use of Cardinality Aggregation Query to get the count of hld_id and then I used further Bucket Selector Aggregation in which I've added the conditions for count()==2
POST shareholder/_search
{
"size": 0,
"aggs": {
"share_holder": {
"terms": {
"field": "hld_id",
"order": {
"_key": "desc"
}
},
"aggs": {
"com_aggs": {
"terms": {
"field": "com_id"
},
"aggs": {
"count_filter":{
"bucket_selector": {
"buckets_path": {
"count_path": "_count"
},
"script": "params.count_path == 2"
}
}
}
}
}
}
}
}
Below is how the response appears.
Response:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"share_holder" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002",
"doc_count" : 3,
"com_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002",
"doc_count" : 2 <---- Count == 2
}
]
}
},
{
"key" : "001",
"doc_count" : 2,
"com_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
}
}
Note that the second bucket is empty. I'm trying to see if I can filter the above query so that "key": "001" doesn't appear in first place.
Solution 2: Using Elasticsearch SQL:
If you have the x-pack version of Kibana, you can probably execute the below queries in SQLish style:
Query:1
POST /_sql?format=txt
{
"query": "SELECT hld_id, com_id, count(*) FROM shareholder GROUP BY hld_id, com_id ORDER BY count(*) desc"
}
Response:
hld_id | com_id | count(*)
---------------+---------------+---------------
002 |002 |2
001 |001 |1
001 |002 |1
002 |001 |1
Query 2:
POST /_sql?format=txt
{
"query": "SELECT hld_id, com_id FROM shareholder GROUP BY hld_id, com_id HAVING count(*) = 2"
}
Response:
hld_id | com_id
---------------+---------------
002 |002
Solution 3: Using Script in Terms Aggregation
Aggregation Query:
POST shareholder/_search
{
"size": 0,
"aggs": {
"query_groupby_count": {
"terms": {
"script": {
"source": """
doc['hld_id'].value + ", " + doc['com_id'].value
"""
}
}
},
"query_groupby_count_equals_2": {
"terms": {
"script": {
"source": """
doc['hld_id'].value + ", " + doc['com_id'].value
"""
}
},
"aggs": {
"myaggs": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": "params.count == 2"
}
}
}
}
}
}
Response:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"query_groupby_count_equals_2" : { <---- Group By Query For Count == 2
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002, 002",
"doc_count" : 2
}
]
},
"query_groupby_count" : { <---- Group By Query
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "002, 002",
"doc_count" : 2
},
{
"key" : "001, 001",
"doc_count" : 1
},
{
"key" : "001, 002",
"doc_count" : 1
},
{
"key" : "002, 001",
"doc_count" : 1
}
]
}
}
}
Using CURL:
First let us save the query in a .txt or .json file.
For e.g I created a file called query.json, copy and pasted only the query in that file.
{
"query": "SELECT hld_id, com_id, count(*) FROM shareholder GROUP BY hld_id, com_id ORDER BY count(*) desc"
}
Now execute the below curl command where you'd refer the file as shown below:
curl -XGET http://localhost:9200/_sql?format=txt -H "Content-Type: application/json" -d #query.json
Hope this helps!

Is it possible with aggregation to amalgamate all values of an array property from all grouped documents into the coalesced document?

I have documents with the format similar to the following:
[
{
"name": "fred",
"title": "engineer",
"division_id": 20
"skills": [
"walking",
"talking"
]
},
{
"name": "ed",
"title": "ticket-taker",
"division_id": 20
"skills": [
"smiling"
]
}
]
I would like to run an aggs query that would show the complete set of skills for the division: ie,
{
"aggs":{
"distinct_skills":{
"cardinality":{
"field":"division_id"
}
}
},
"_source":{
"includes":[
"division_id",
"skills"
]
}
}
.. so that the resulting hit would look like:
{
"division_id": 20,
"skills": [
"walking",
"talking",
"smiling"
]
}
I know I can retrieve inner_hits and iterate through the list and amalgamate values "manually". I assume it would perform better if I could do it a query.

Just pipe two Terms Aggregation queries as shown below:
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"my_division_ids": {
"terms": {
"field": "division_id",
"size": 10
},
"aggs": {
"my_skills": {
"terms": {
"field": "skills", <---- If it is not keyword field use `skills.keyword` field if using dynamic mapping.
"size": 10
}
}
}
}
}
}
Below is the sample response:
Response:
{
"took" : 490,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_division_ids" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 20, <---- division_id
"doc_count" : 2,
"my_skills" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ <---- Skills
{
"key" : "smiling",
"doc_count" : 1
},
{
"key" : "talking",
"doc_count" : 1
},
{
"key" : "walking",
"doc_count" : 1
}
]
}
}
]
}
}
}
Hope this helps!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Get an aggregate count in elasticsearch based on particular uniqueid field - elasticsearch

Related

How to get word count in docs as a aggregate over time in elastic search?

Aggregate by custom defined buckets, according to field value

Elasticsearch, sort aggs according to sibling fields but from different index

About elasticsearch group by two fields and then filter or order

Is it possible with aggregation to amalgamate all values of an array property from all grouped documents into the coalesced document?

Categories

Resources