Can an aggregation have two keys in elasticsearch - elasticsearch

Suppose I have an index with a nested document that looks like this:
{
"mappings": {
"assignment": {
"properties":{
"id": {
"type": "string"
},
"location": {
"type": "string"
},
"typeOfLoss":{
"type": "string"
},
"lineItems": {
"type": "nested",
"properties": {
"categoryCode":{
"type": "string"
},
"selectorCode":{
"type": "string"
},
"roomType": {
"type": "string"
}
}
}
I now want to get a count aggregation of "lineItems" documents that returns the selectorCode and categoryCode where the roomType matches a search query. I am new to elasticsearch and can write my query in SQL
SELECT COUNT(*) as theCount, ln.category_code, ln.selector_code
FROM line_items as ln, assignment
WHERE assignment.location = "84043"
AND assignment.typeOfLoss = "Fire"
AND ln.roomType = "kitchen"
GROUP BY ln.category_code, ln.selector_code
ORDER BY theCount DESC;
I have started on the NEST query but am having some problems and am hoping someone can point me in the right directions.
var typeOfLossQuery = new TermQuery
{
Field = "typeOfLoss",
Value = typeOfLoss
};
var locationQuery = new TermQuery
{
Field = "location",
Value = location
};
var roomTypeQuery = new TermQuery
{
Field = "roomType",
Value = roomType
};
var result = client.Search<LineItem>(s => s
.From(0)
.Size(numberOfItems)
.Query(q => q.HasParent<Assignment>(a => a
.Query(x =>x
.MatchAll() && typeOfLossQuery && locationQuery
)
) && q.MatchAll() && roomTypeQuery
));

You can indeed do this with ElasticSearch but it's not quite as clean as in SQL. We can accomplish this with Nested Aggregations.
Setup
I'm going to setup the data so that you'd get the following equivalent result in SQL:
categoryCode | selectorCode | Count
c1 | s1 | 1
c1 | s2 | 2
PUT test1
PUT test1/_mapping/type1
{
"properties": {
"id": {
"type": "string"
},
"location": {
"type": "string"
},
"typeOfLoss": {
"type": "string"
},
"lineItems": {
"type": "nested",
"properties": {
"categoryCode": {
"type": "string",
"fielddata": true
},
"selectorCode": {
"type": "string",
"fielddata": true
},
"roomType": {
"type": "string"
}
}
}
}
}
POST test1/type1
{
"location":"l1",
"lineItems":
{
"categoryCode": "c1",
"selectorCode": "s1",
"roomType": "r1"
}
}
POST test1/type1
{
"location":"l1",
"lineItems":
{
"categoryCode": "c1",
"selectorCode": "s2",
"roomType": "r1"
}
}
POST test1/type1
{
"location":"l1",
"lineItems":
{
"categoryCode": "c1",
"selectorCode": "s2",
"roomType": "r1"
}
}
Query
GET test1/type1/_search
{
"size": 0,
"query": {
"nested": {
"path": "lineItems",
"query": {
"term": {
"lineItems.roomType": {
"value": "r1"
}
}
}
}
},
"aggs": {
"nestedAgg": {
"nested": {
"path": "lineItems"
},
"aggs": {
"byCategory": {
"terms": {
"field": "lineItems.categoryCode",
"size": 10
},
"aggs": {
"bySelector": {
"terms": {
"field": "lineItems.selectorCode",
"size": 10
}
}
}
}
}
}
}
}
My query is saying the following:
only show me data where roomType = 'r1'
aggregate (group in SQL) by categoryCode
created a "nested" or "sub" aggregation on "selectorCode"
Result
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"nestedAgg": {
"doc_count": 3,
"byCategory": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "c1",
"doc_count": 3,
"bySelector": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "s2",
"doc_count": 2
},
{
"key": "s1",
"doc_count": 1
}
]
}
}
]
}
}
}
}
So the results returns a list of aggregations. Inside an aggregation is a "bucket". Notice that the outer bucket for byCategory shows a doc_count of 3. That's because there are 3 records that match in the DB.
Then, nested inside that is the bySelector bucket showing s2 and s1 with a doc_count of 2 and 1 respectively.
Hopefully that helps, I'll let you turn all this into a NEST query.

Related

Nested array of objects aggregation in Elasticsearch

Documents in the Elasticsearch are indexed as such
Document 1
{
"task_completed": 10
"tagged_object": [
{
"category": "cat",
"count": 10
},
{
"category": "cars",
"count": 20
}
]
}
Document 2
{
"task_completed": 50
"tagged_object": [
{
"category": "cars",
"count": 100
},
{
"category": "dog",
"count": 5
}
]
}
As you can see that the value of the category key is dynamic in nature. I want to perform a similar aggregation like in SQL with the group by category and return the sum of the count of each category.
In the above example, the aggregation should return
cat: 10,
cars: 120 and
dog: 5
Wanted to know how to write this aggregation query in Elasticsearch if it is possible. Thanks in advance.
You can achieve your required result, using nested, terms, and sum aggregation.
Adding a working example with index mapping, search query and search result
Index Mapping:
{
"mappings": {
"properties": {
"tagged_object": {
"type": "nested"
}
}
}
}
Search Query:
{
"size": 0,
"aggs": {
"resellers": {
"nested": {
"path": "tagged_object"
},
"aggs": {
"books": {
"terms": {
"field": "tagged_object.category.keyword"
},
"aggs":{
"sum_of_count":{
"sum":{
"field":"tagged_object.count"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"resellers": {
"doc_count": 4,
"books": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cars",
"doc_count": 2,
"sum_of_count": {
"value": 120.0
}
},
{
"key": "cat",
"doc_count": 1,
"sum_of_count": {
"value": 10.0
}
},
{
"key": "dog",
"doc_count": 1,
"sum_of_count": {
"value": 5.0
}
}
]
}
}
}

ElasticSearch aggregation query with List in documents

I have following records of car sales of different brands in different cities.
Document -1
{
"city": "Delhi",
"cars":[{
"name":"Toyota",
"purchase":100,
"sold":80
},{
"name":"Honda",
"purchase":200,
"sold":150
}]
}
Document -2
{
"city": "Delhi",
"cars":[{
"name":"Toyota",
"purchase":50,
"sold":40
},{
"name":"Honda",
"purchase":150,
"sold":120
}]
}
I am trying to come up with query to aggregate car statistics for a given city but not getting the right query.
Required result:
{
"city": "Delhi",
"cars":[{
"name":"Toyota",
"purchase":150,
"sold":120
},{
"name":"Honda",
"purchase":350,
"sold":270
}]
}
First you need to map your array as a nested field (script would be complicated and not performant). Nested field are indexed, aggregation will be pretty fast.
remove your index / or create a new one. Please note i use test as type.
{
"mappings": {
"test": {
"properties": {
"city": {
"type": "keyword"
},
"cars": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"purchase": {
"type": "integer"
},
"sold": {
"type": "integer"
}
}
}
}
}
}
}
Index your document (same way you did)
For the aggregation:
{
"size": 0,
"aggs": {
"avg_grade": {
"terms": {
"field": "city"
},
"aggs": {
"resellers": {
"nested": {
"path": "cars"
},
"aggs": {
"agg_name": {
"terms": {
"field": "cars.name"
},
"aggs": {
"avg_pur": {
"sum": {
"field": "cars.purchase"
}
},
"avg_sold": {
"sum": {
"field": "cars.sold"
}
}
}
}
}
}
}
}
}
}
result:
buckets": [
{
"key": "Honda",
"doc_count": 2,
"avg_pur": {
"value": 350
},
"avg_sold": {
"value": 270
}
}
,
{
"key": "Toyota",
"doc_count": 2,
"avg_pur": {
"value": 150
},
"avg_sold": {
"value": 120
}
}
]
if you have index the name / city field as a text (you have to ask first if this is necessary), use .keyword in the term aggregation ("cars.name.keyword").

Multiple key aggregation in ElasticSearch

I am new to Elastic Search and was exploring aggregation query. The documents I have are in the format -
{"name":"A",
"class":"10th",
"subjects":{
"S1":92,
"S2":92,
"S3":92,
}
}
We have about 40k such documents in our ES with the Subjects varying from student to student. The query to the system can be to aggregate all subject-wise scores for a given class. We tried to create a bucket aggregation query as explained in this guide here, however, this generates a single bucket per document and in our understanding requires an explicit mention of every subject.
We want to system to generate subject wise aggregate for the data by executing a single aggregation query, the problem I face is that in our data the subjects could vary from student to student and we don't have a global list of subject keys.
We wrote the following script but this only works if we know all possible subjects.
GET student_data_v1_1/_search
{ "query" :
{"match" :
{ "class" : "' + query + '" }},
"aggs" : { "my_buckets" : { "terms" :
{ "field" : "subjects", "size":10000 },
"aggregations": {"the_avg":
{"avg": { "field": "subjects.value" }}} }},
"size" : 0 }'
but this query only works for the document structure, but does not work multiple subjects are defined where we may not know the key-pair -
{"name":"A",
"class":"10th",
"subjects":{
"value":93
}
}
An alternate form the document is present is that the subject is a list of dictionaries -
{"name":"A",
"class":"10th",
"subjects":[
{"S1":92},
{"S2":92},
{"S3":92},
]
}
Having an aggregation query to solve either of the 2 document formats would be helpful.
======EDITS======
After updating the document to hold weights for each subject -
{
class": "10th",
"subject": [
{
"name": "s1",
"marks": 90,
"weight":30
},
{
"name": "s2",
"marks": 80,
"weight":70
}
]}
I have updated the query to be -
{
"query": {
"match": {
"class": "10th"
}
},
"aggs": {
"subjects": {
"nested": {
"path": "scores"
},
"aggs": {
"subjects": {
"terms": {
"field": "subject.name"
},
"aggs" : { "weighted_grade": { "weighted_avg": { "value": { "field": "subjects.score" }, "weight": { "field": "subjects.weight" } } } }
}
}
}
}
},
"size": 0
}
but it throws the error-
{u'error': {u'col': 312,
u'line': 1,
u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
u'root_cause': [{u'col': 312,
u'line': 1,
u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
u'type': u'unknown_named_object_exception'}],
u'type': u'unknown_named_object_exception'},
u'status': 400}
To achieve the required result I would suggest you to keep your index mapping as follows:
{
"properties": {
"class": {
"type": "keyword"
},
"subject": {
"type": "nested",
"properties": {
"marks": {
"type": "integer"
},
"name": {
"type": "keyword"
}
}
}
}
}
In the mapping above I have created subject as nested type with two properties, name to hold subject name and marks to hold marks in the subject.
Sample doc:
{
"class": "10th",
"subject": [
{
"name": "s1",
"marks": 90
},
{
"name": "s2",
"marks": 80
}
]
}
Now you can use nested aggregation and multilevel aggregation (i.e. aggregation inside aggregation). I used nested aggregation with terms aggregation for subject.name to get bucket containing all the available subjects. Then to get avg for each subject we add a child aggregation of avg to the subjects aggregation as below:
{
"query": {
"match": {
"class": "10th"
}
},
"aggs": {
"subjects": {
"nested": {
"path": "subject"
},
"aggs": {
"subjects": {
"terms": {
"field": "subject.name"
},
"aggs": {
"avg_score": {
"avg": {
"field": "subject.marks"
}
}
}
}
}
}
},
"size": 0
}
NOTE: I have added "size" : 0 so that elastic doesn't return matching docs in the result. To include or exclude it depends totally on your use case.
Sample result:
{
"took": 25,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": [
]
},
"aggregations": {
"subjects": {
"doc_count": 6,
"subjects": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "s1",
"doc_count": 3,
"avg_score": {
"value": 80
}
},
{
"key": "s2",
"doc_count": 2,
"avg_score": {
"value": 75
}
},
{
"key": "s3",
"doc_count": 1,
"avg_score": {
"value": 80
}
}
]
}
}
}
}
As you can see the result contains buckets with key as subject name and avg_score.value as the avg of marks.
UPDATE to include weighted_avg:
{
"query": {
"match": {
"class": "10th"
}
},
"aggs": {
"subjects": {
"nested": {
"path": "subject"
},
"aggs": {
"subjects": {
"terms": {
"field": "subject.name"
},
"aggs": {
"avg_score": {
"avg": {
"field": "subject.marks"
}
},
"weighted_grade": {
"weighted_avg": {
"value": {
"field": "subject.marks"
},
"weight": {
"field": "subject.weight"
}
}
}
}
}
}
}
},
"size": 0
}

Elasticsearch aggregation by field name

Imagine two documents:
[
{
"_id": "abc",
"categories": {
"category-id-1": 1,
"category-id-2": 50
}
},
{
"_id": "def",
"categories": {
"category-id-1": 2
}
}
]
As you can see, each document can be associated with a number of categories, by setting a nested field into the categories field.
With this mapping, I should be able to request the documents from a defined category and to order them by the value set as value for this field.
My problem is that I now want to make an aggregation to count for each category the number of documents. That would give the following result for the dataset I provided:
{
"aggregations": {
"categories" : {
"buckets": [
{
"key": "category-id-1",
"doc_count": 2
},
{
"key": "category-id-2",
"doc_count": 1
}
]
}
}
}
I can't find anything in the documentation to solve this problem. I'm completely new to ElasticSearch so I may be doing something wrong either on my documentation research or on my mapping choice.
Is it possible to make this kind of aggregation with my mapping? I'm using ES 6.x
EDIT: Here is the mapping for the index:
{
"test1234": {
"mappings": {
"_doc": {
"properties": {
"categories": {
"properties": {
"category-id-1": {
"type": "long"
},
"category-id-2": {
"type": "long"
}
}
}
}
}
}
}
}
The most straightforward solution is to use a new field that contains all the distinct categories of a document.
If we call this field categories_list here could be a solution :
Change the mapping to
{
"test1234": {
"mappings": {
"_doc": {
"properties": {
"categories": {
"properties": {
"category-id-1": {
"type": "long"
},
"category-id-2": {
"type": "long"
}
}
},
"categories_list": {
"type": "keyword"
}
}
}
}
}
}
Then you need to modify your documents like this :
[
{
"_id": "abc",
"categories": {
"category-id-1": 1,
"category-id-2": 50
},
"categories_list": ["category-id-1", "category-id-2"]
},
{
"_id": "def",
"categories": {
"category-id-1": 2
},
"categories_list": ["category-id-1"]
}
]
then your aggregation request should be
{
"aggs": {
"categories": {
"terms": {
"field": "categories_list",
"size": 10
}
}
}
}
and will return
"aggregations": {
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category-id-1",
"doc_count": 2
},
{
"key": "category-id-2",
"doc_count": 1
}
]
}
}

Aggregate only matched nested object values in ElasticSearch

I need to sum only the values on the nested objects that match the query. It looks like ElasticSearch determines the documents matching the query and then sums across all of the nested objects. From the below outline I want to search on nestedobjects.objtype="A" and get back the sum of objvalue only for matching nestedobjects, I want to get the value 4. is this possible? If so, how?
Here is the mapping
{
"myindex": {
"mappings": {
"mytype": {
"properties": {
"nestedobjects": {
"type": "nested",
"include_in_parent": true,
"properties": {
"objtype": {
"type": "string"
},
"objvalue": {
"type": "integer"
}
}
}
}
}
}
}
}
Here are my documents
PUT /myindex/mytype/1
{
"nestedobjects": [
{ "objtype": "A", "objvalue": 1 },
{ "objtype": "B", "objvalue": 2 }
]
}
PUT /myindex/mytype/2
{
"nestedobjects": [
{ "objtype": "A", "objvalue": 3 },
{ "objtype": "B", "objvalue": 3 }
]
}
Here is my query code.
POST allscriptshl7/_search?search_type=count
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "nestedobjects.objtype:A"
}
}
}
},
"aggregations": {
"my_agg": {
"sum": {
"field": "nestedobjects.objvalue"
}
}
}
}
Since both (outer) documents match the condition that one of their inner documents match the query, both outer documents are returned, and the aggregation is calculated against all of the inner documents belonging to those outer documents. Whew.
Anyway, this seems to do what you're wanting, I think, using filter aggregation:
POST /myindex/_search?search_type=count
{
"aggs": {
"nested_nestedobjects": {
"nested": {
"path": "nestedobjects"
},
"aggs": {
"filtered_nestedobjects": {
"filter": {
"term": {
"nestedobjects.objtype": "a"
}
},
"aggs": {
"my_agg": {
"sum": {
"field": "nestedobjects.objvalue"
}
}
}
}
}
}
}
}
...
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"nested_nestedobjects": {
"doc_count": 4,
"filtered_nestedobjects": {
"doc_count": 2,
"my_agg": {
"value": 4,
"value_as_string": "4.0"
}
}
}
}
}
Here is some code I used to test it:
http://sense.qbox.io/gist/c1494619ff1bd0394d61f3d5a16cb9dfc229113a
Very well-structured question, by the way.

Resources