Elasticsearch aggregation by field name

Elasticsearch aggregation by field name - elasticsearch

Imagine two documents:
[
{
"_id": "abc",
"categories": {
"category-id-1": 1,
"category-id-2": 50
}
},
{
"_id": "def",
"categories": {
"category-id-1": 2
}
}
]
As you can see, each document can be associated with a number of categories, by setting a nested field into the categories field.
With this mapping, I should be able to request the documents from a defined category and to order them by the value set as value for this field.
My problem is that I now want to make an aggregation to count for each category the number of documents. That would give the following result for the dataset I provided:
{
"aggregations": {
"categories" : {
"buckets": [
{
"key": "category-id-1",
"doc_count": 2
},
{
"key": "category-id-2",
"doc_count": 1
}
]
}
}
}
I can't find anything in the documentation to solve this problem. I'm completely new to ElasticSearch so I may be doing something wrong either on my documentation research or on my mapping choice.
Is it possible to make this kind of aggregation with my mapping? I'm using ES 6.x
EDIT: Here is the mapping for the index:
{
"test1234": {
"mappings": {
"_doc": {
"properties": {
"categories": {
"properties": {
"category-id-1": {
"type": "long"
},
"category-id-2": {
"type": "long"
}
}
}
}
}
}
}
}

The most straightforward solution is to use a new field that contains all the distinct categories of a document.
If we call this field categories_list here could be a solution :
Change the mapping to
{
"test1234": {
"mappings": {
"_doc": {
"properties": {
"categories": {
"properties": {
"category-id-1": {
"type": "long"
},
"category-id-2": {
"type": "long"
}
}
},
"categories_list": {
"type": "keyword"
}
}
}
}
}
}
Then you need to modify your documents like this :
[
{
"_id": "abc",
"categories": {
"category-id-1": 1,
"category-id-2": 50
},
"categories_list": ["category-id-1", "category-id-2"]
},
{
"_id": "def",
"categories": {
"category-id-1": 2
},
"categories_list": ["category-id-1"]
}
]
then your aggregation request should be
{
"aggs": {
"categories": {
"terms": {
"field": "categories_list",
"size": 10
}
}
}
}
and will return
"aggregations": {
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category-id-1",
"doc_count": 2
},
{
"key": "category-id-2",
"doc_count": 1
}
]
}
}

Related

elasticsearch Saved Search with Group by

index_name: my_data-2020-12-01
ticket_number: T123
ticket_status: OPEN
ticket_updated_time: 2020-12-01 12:22:12
index_name: my_data-2020-12-01
ticket_number: T124
ticket_status: OPEN
ticket_updated_time: 2020-12-01 12:32:11
index_name: my_data-2020-12-02
ticket_number: T123
ticket_status: INPROGRESS
ticket_updated_time: 2020-12-02 12:33:12
index_name: my_data-2020-12-02
ticket_number: T125
ticket_status: OPEN
ticket_updated_time: 2020-12-02 14:11:45
I want to create a saved search with group by ticket_number field get unique doc with latest ticket status (ticket_status). Is it possible?

You can simply query again, I am assuming you are using Kibana for visualization purpose. in your query, you need to filter based on the ticket_number and sort based on ticket_updated_time.
Working example
Index mapping
{
"mappings": {
"properties": {
"ticket_updated_time": {
"type": "date"
},
"ticket_number" :{
"type" : "text"
},
"ticket_status" : {
"type" : "text"
}
}
}
}
Index sample docs
{
"ticket_number": "T123",
"ticket_status": "OPEN",
"ticket_updated_time": "2020-12-01T12:22:12"
}
{
"ticket_number": "T123",
"ticket_status": "INPROGRESS",
"ticket_updated_time": "2020-12-02T12:33:12"
}
Now as you can see, both the sample documents belong to the same ticket_number with different status and updated time.
Search query
{
"size" : 1, // fetch only the latest status document, if you remove this, will get other ticket with different status.
"query": {
"bool": {
"filter": [
{
"match": {
"ticket_number": "T123"
}
}
]
}
},
"sort": [
{
"ticket_updated_time": {
"order": "desc"
}
}
]
}
And search result
"hits": [
{
"_index": "65180491",
"_type": "_doc",
"_id": "2",
"_score": null,
"_source": {
"ticket_number": "T123",
"ticket_status": "INPROGRESS",
"ticket_updated_time": "2020-12-02T12:33:12"
},
"sort": [
1606912392000
]
}
]

If you need to group by ticket_number field, then you can use aggregation as well
Index Mapping:
{
"mappings": {
"properties": {
"ticket_updated_time": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
Search Query:
{
"size": 0,
"aggs": {
"unique_id": {
"terms": {
"field": "ticket_number.keyword",
"order": {
"latestOrder": "desc"
}
},
"aggs": {
"latestOrder": {
"max": {
"field": "ticket_updated_time"
}
}
}
}
}
}
Search Result:
"buckets": [
{
"key": "T125",
"doc_count": 1,
"latestOrder": {
"value": 1.606918305E12,
"value_as_string": "2020-12-02 14:11:45"
}
},
{
"key": "T123",
"doc_count": 2,
"latestOrder": {
"value": 1.606912392E12,
"value_as_string": "2020-12-02 12:33:12"
}
},
{
"key": "T124",
"doc_count": 1,
"latestOrder": {
"value": 1.606825931E12,
"value_as_string": "2020-12-01 12:32:11"
}
}
]

ElasticSearch aggregation query with List in documents

I have following records of car sales of different brands in different cities.
Document -1
{
"city": "Delhi",
"cars":[{
"name":"Toyota",
"purchase":100,
"sold":80
},{
"name":"Honda",
"purchase":200,
"sold":150
}]
}
Document -2
{
"city": "Delhi",
"cars":[{
"name":"Toyota",
"purchase":50,
"sold":40
},{
"name":"Honda",
"purchase":150,
"sold":120
}]
}
I am trying to come up with query to aggregate car statistics for a given city but not getting the right query.
Required result:
{
"city": "Delhi",
"cars":[{
"name":"Toyota",
"purchase":150,
"sold":120
},{
"name":"Honda",
"purchase":350,
"sold":270
}]
}

First you need to map your array as a nested field (script would be complicated and not performant). Nested field are indexed, aggregation will be pretty fast.
remove your index / or create a new one. Please note i use test as type.
{
"mappings": {
"test": {
"properties": {
"city": {
"type": "keyword"
},
"cars": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"purchase": {
"type": "integer"
},
"sold": {
"type": "integer"
}
}
}
}
}
}
}
Index your document (same way you did)
For the aggregation:
{
"size": 0,
"aggs": {
"avg_grade": {
"terms": {
"field": "city"
},
"aggs": {
"resellers": {
"nested": {
"path": "cars"
},
"aggs": {
"agg_name": {
"terms": {
"field": "cars.name"
},
"aggs": {
"avg_pur": {
"sum": {
"field": "cars.purchase"
}
},
"avg_sold": {
"sum": {
"field": "cars.sold"
}
}
}
}
}
}
}
}
}
}
result:
buckets": [
{
"key": "Honda",
"doc_count": 2,
"avg_pur": {
"value": 350
},
"avg_sold": {
"value": 270
}
}
,
{
"key": "Toyota",
"doc_count": 2,
"avg_pur": {
"value": 150
},
"avg_sold": {
"value": 120
}
}
]
if you have index the name / city field as a text (you have to ask first if this is necessary), use .keyword in the term aggregation ("cars.name.keyword").

Terms aggregation with nested wildcard path

Given the following nested object of nested objects
{
[...]
"nested_parent":{
"nested_child_1":{
"classifier":"one"
},
"nested_child_2":{
"classifier":"two"
},
"nested_child_3":{
"classifier":"two"
},
"nested_child_4":{
"classifier":"five"
},
"nested_child_5":{
"classifier":"six"
}
[...]
}
I'm wanting to aggregate on the wildcard-ish field nested_parent.*.classifier, along the lines of
{
"size": 0,
"aggs": {
"termsAgg": {
"nested": {
"path": "nested_parent.*"
},
"aggs": {
"termsAgg": {
"terms": {
"size": 1000,
"field": "nested_parent.*.classifier"
}
}
}
}
}
}
which does not seem to work -- possibly because the path and field are not defined clearly enough.
How can I aggregate on nested objects with dynamically created nested mappings which share most of their properties, including the classifier on which I intend to terms-aggregate?

Tdlr;
A bit late to the party.
I would suggest a different approach as I don't see a possible solution using wildcards.
My solution would involve using the copy_to to create a field that you will be able to access using aggregation.
Solution
The idea is to create a field that will store the values of all your classifiers.
Which you can be doing aggregation on.
PUT /54198251/
{
"mappings": {
"properties": {
"classifiers": {
"type": "keyword"
},
"parent": {
"type": "nested",
"properties": {
"child": {
"type": "nested",
"properties": {
"classifier": {
"type": "keyword",
"copy_to": "classifiers"
}
}
},
"child2": {
"type": "nested",
"properties": {
"classifier": {
"type": "keyword",
"copy_to": "classifiers"
}
}
}
}
}
}
}
}
POST /54198251/_doc
{
"parent": {
"child": {
"classifier": "c1"
},
"child2": {
"classifier": "c2"
}
}
}
GET /54198251/_search
{
"aggs": {
"classifiers": {
"terms": {
"field": "classifiers",
"size": 10
}
}
}
}
Will give you:
"aggregations": {
"classifiers": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "c1",
"doc_count": 1
},
{
"key": "c2",
"doc_count": 1
}
]
}
}

Elasticsearch: Retrieving filtered and unfiltered count in one request

I am using the following mapping in one of my ElasticSearch indices:
"mappings": {
"my-mapping": {
"properties": {
"id": {
"type": "keyword"
},
"groupId": {
"type" : "keyword"
}
"title": {
"type": "text"
}
}
}
}
I now want to count elements matching to a search string which may be present inside of "title", grouped by my groupId. I can achieve that using aggregations and buckets:
/indexname/_search
{
"query" : {
"term" : {
"title" : "sky"
}
},
"aggs": {
"filtered_buckets": {
"terms": {
"field": "groupId"
}
}
}
}
Additionally, I want to know the count of all elements not respecting the filter. I could simply achieve that using a non-queried search:
/indexname/_search
{
"aggs": {
"filtered_buckets": {
"terms": {
"field": "groupId"
}
}
}
}
Current problem is: Is there any possibility to generate aggregation data containing the filtered count and the unfiltered count of only those groups which had a hit before - in one request?
For example:
"buckets": [
{
"key": "257786",
"doc_count": 3024,
"filtered_doc_count" : 202
},
{
"key": "254640",
"doc_count": 3010
"filtered_doc_count" : 1
},
{
"key": "252256",
"doc_count": 2367
"filtered_doc_count" : 5
},
...
]
One way I see is splitting the requests in two while first requesting all filtered buckets (their IDs) and then requesting the counts of these specific buckets using "terms" : { "id" : ["4", "65", "404"] }. This is not very nice and I don't want to request twice (_msearch does not help here).
Second bad solution would be to persist the all-counts somewhere in all of my entities.
Is there any way to achieve what I described in a single request?
PS: Please correct me, if the question is unclear.

Based on these:
How to filter terms aggregation
http://nocf-www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
I made this:
PUT test
{
"mappings": {
"my-mapping": {
"properties": {
"id": {
"type": "keyword"
},
"groupId": {
"type" : "keyword"
},
"title": {
"type": "text"
}
}
}
}
}
PUT test/type1/1
{
"id":1,
"groupId": 1,
"title": "asd"
}
PUT test/type1/2
{
"id":2,
"groupId": 1,
"title": "sky"
}
PUT test/type1/3
{
"id":3,
"groupId": 2,
"title": "sky"
}
PUT test/type1/4
{
"id":4,
"groupId": 2,
"title": "sky"
}
PUT test/type1/5
{
"id":5,
"groupId": 2,
"title": "sky"
}
POST test/type1/_search
{
"aggs": {
"categories-filtered": {
"filter": {"term": {"title": "sky"}},
"aggs": {
"names": {
"terms": {"field": "groupId"}
}
}
},
"categories": {
"terms": {"field": "groupId"}
}
}
}

Elastic Search query return terms within array of a specific type

I've a mapping of an index as following:
{"tagged_index":{"mappings":{"tagged":{"properties":{"tags":{"properties":{"resources":{"properties":{"tagName":{"type":"string"},"type":{"type":"string"}}}}},"content":{"type":"string"}}}}}}
Where Resources is an array which can have multiple tags. For example
{"_id":"82906194","_source":{"tags":{"resources":[{"type":"Person","tagName":"Kim_Kardashian",},{"type":"Person","tagName":"Kanye_West",},{"type":"City","tagName":"New_York",},...},"content":" Popular NEWS ..."}}
,
{"_id":"82906195","_source":{"tags":{"resources":[{"type":"City","tagName":"London",},{"type":"Country","tagName":"USA",},{"type":"Music","tagName":"Hello",},...},"content":" Adele's Hello..."}},
...
I do know how to extract important terms[tagName] with the below query, but I do not want terms[tagName] of all types.
How can I extract only the terms which are for example Cities only [type:City]? (I would like to get a list of tagName where the type is City i.e. London, New_York, Berlin,...)
{"size":0,"query":{"filtered":{"query":{"query_string":{"query":"*","analyze_wildcard":true}}}},"aggs":{"Cities":{"terms":{"field":"tags.resources.tagName","size":10,"order":{"_count":"desc"}}}}}
Following is how the required output should look like:
{"took":1200,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":5179261,"max_score":0.0,"hits":[]},"aggregations":{"Cities":{"doc_count_error_upper_bound":46737,"sum_other_doc_count":36037440,"buckets":[{"key":"London","doc_count":332820},{"key":"New_York","doc_count":211274},{"key":"Berlin","doc_count":156954},{"key":"Amsterdam","doc_count":132173},...

Can you try this:
{
"_source" : ["tags.resources.tagName"]
"query": {
"term": {
"tags.resources.type": {
"value": "City"
}
}
}
}
Above query will fetch those resources which are of type city provided resources is of object type.
After Edit
Problem Group By Tag name which are Of city Type. That would not be achieved with the current mapping you have. You will have to change resources field to nested type.
Mapping would look like.
"mappings": {
"resource": {
"properties": {
"tags": {
"properties": {
"content": {
"type": "string"
},
"resources": {
"type": "nested",
"properties": {
"tagName": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
}
}
Final query would be :
{
"size": 0,
"query": {
"nested": {
"path": "tags.resources",
"query": {
"match": {
"tags.resources.type": "city"
}
}
}
},
"aggs": {
"resources Nested path": {
"nested": {
"path": "tags.resources"
},
"aggs": {
"city type": {
"filter": {
"term": {
"tags.resources.type": "city"
}
},
"aggs": {
"group By tagName": {
"terms": {
"field": "tags.resources.tagName"
}
}
}
}
}
}
}
}
Output would be:
"aggregations": {
"resources Nested path": {
"doc_count": 6,
"city type": {
"doc_count": 2,
"group By tagName": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "london",
"doc_count": 1
},
{
"key": "new_york",
"doc_count": 1
}
]
}
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch aggregation by field name - elasticsearch

Related

elasticsearch Saved Search with Group by

ElasticSearch aggregation query with List in documents

Terms aggregation with nested wildcard path

Elasticsearch: Retrieving filtered and unfiltered count in one request

Elastic Search query return terms within array of a specific type

Categories

Resources