How do I compute for the fields of matching documents in Elasticsearch? - elasticsearch

Here is my sample document:
{
"jobID": "ace4c888-1907-4021-a808-4a816e99aa2e",
"startTime": 1415255164835,
"endTime": 1415255164898,
"moduleCode": "STARTING_MODULE"
}
I have thousands of documents.
I have a pair of documents with the same jobID and the module code would be STARTING_MODULE and ENDING_MODULE.
My formula would be ENDING_MODULE endTime minus STARTING_MODULE startTime equals the elapsed time it took the module to process.
My question is: How do I get the total of all results with the elapsed time that is less than let's say 28800000?
Is such results possible with Elasticsearch? I'd like to display my results in Kibana too.
Please let me know if this needs more clarification. Thanks!

Try the following, might not be ideal, but it returns a jobID and the elapsed time. First I'm assuming jobID and moduleCode are not_analyzed:
{
"mappings": {
"jobs": {
"properties": {
"jobID":{
"type": "string",
"index": "not_analyzed"
},
"startTime":{
"type": "date"
},
"endTime":{
"type": "date"
},
"moduleCode":{
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Used scripted_metric aggregation available in ES 1.4.0 to compute the difference between those two values. Haven't looked into how to add the filtering for "less than 28800000", but I hope there can be something done with that script to limit this:
{
"query": {
"match_all": {}
},
"aggs": {
"jobIds": {
"terms": {
"field": "jobID"
},
"aggs": {
"executionTimes": {
"scripted_metric": {
"init_script": "_agg['time'] = 0L",
"map_script": "if (doc['moduleCode'].value == \"STARTING_MODULE\") { _agg['time']=-1*doc['startTime'].value } else { _agg['time']=doc['endTime'].value}",
"combine_script": "execution = 0; for (t in _agg.time) { execution += t };return execution",
"reduce_script": "execution = 0; for (a in _aggs) { execution += a }; return execution"
}
}
}
}
}
}
And the result should be something like this:
"aggregations": {
"jobIds": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ace4c888-1907-4021-a808-4a816e99aa1e",
"doc_count": 2,
"executionTimes": {
"value": 1
}
},
{
"key": "ace4c888-1907-4021-a808-4a816e99aa2e",
"doc_count": 2,
"executionTimes": {
"value": 1000201063
}
},
{
"key": "ace4c888-1907-4021-a808-4a816e99aa3e",
"doc_count": 2,
"executionTimes": {
"value": 10000
}
}
]
}
}

Related

Nested array of objects aggregation in Elasticsearch

Documents in the Elasticsearch are indexed as such
Document 1
{
"task_completed": 10
"tagged_object": [
{
"category": "cat",
"count": 10
},
{
"category": "cars",
"count": 20
}
]
}
Document 2
{
"task_completed": 50
"tagged_object": [
{
"category": "cars",
"count": 100
},
{
"category": "dog",
"count": 5
}
]
}
As you can see that the value of the category key is dynamic in nature. I want to perform a similar aggregation like in SQL with the group by category and return the sum of the count of each category.
In the above example, the aggregation should return
cat: 10,
cars: 120 and
dog: 5
Wanted to know how to write this aggregation query in Elasticsearch if it is possible. Thanks in advance.
You can achieve your required result, using nested, terms, and sum aggregation.
Adding a working example with index mapping, search query and search result
Index Mapping:
{
"mappings": {
"properties": {
"tagged_object": {
"type": "nested"
}
}
}
}
Search Query:
{
"size": 0,
"aggs": {
"resellers": {
"nested": {
"path": "tagged_object"
},
"aggs": {
"books": {
"terms": {
"field": "tagged_object.category.keyword"
},
"aggs":{
"sum_of_count":{
"sum":{
"field":"tagged_object.count"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"resellers": {
"doc_count": 4,
"books": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cars",
"doc_count": 2,
"sum_of_count": {
"value": 120.0
}
},
{
"key": "cat",
"doc_count": 1,
"sum_of_count": {
"value": 10.0
}
},
{
"key": "dog",
"doc_count": 1,
"sum_of_count": {
"value": 5.0
}
}
]
}
}
}

Elasticsearch aggregation over children document field values

I'm facing the following problem of selecting and sorting parent documents based on an aggregated value over its children documents. The aggregation (e.g. sum) itself depends on a query string, i.e. which children documents are relevant for the aggregation.
Example: Given the documents basket A and basket B, for each basket document, I am looking to sum over the number field of its fruit children if the name field matches my query, e.g. apples.
PUT /baskets/_doc/0
{
"name": "basket A",
"fruit": [
{
"name": "apples",
"number": 2
},
{
"name": "oranges",
"number": 3
}
]
}
PUT /baskets/_doc/1
{
"name": "basket B",
"fruit": [
{
"name": "apples",
"number": 3
},
{
"name": "apples",
"number": 3
}
]
}
Mappings:
PUT /baskets
{
"mappings": {
"properties": {
"name": { "type": "text" },
"fruit": {
"type": "nested",
"properties": {
"name": { "type": "text" },
"number": { "type": "long" }
}
}
}
}
}
Use case 1: Which basket has (strictly) more than 5 apples? Would expect only basket B
Use case 2: Sort baskets by number of apples. Would expect basket B with a total of 6 apples, then basket A with a total of 2 apples.
How can one implement this using the Elasticsearch (7.8.0) query DSL?
I have tried so far with nested queries and aggregations without success.
Thanks!
Edit: Added mappings
Edit: Updated the numbers to better reflect the problem
*Edit: Added possible answer to Use case 2 (see comments to the answer from #joe):
GET /profiles/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name",
"order": {"nest > fruit_filter > fruit_sum": "desc"}
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"term": {"fruit.name": "apple"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
}
}
}
}
}
Use case 1:
GET baskets/_search
{
"query": {
"nested": {
"path": "fruit",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"term": {
"fruit.name": {
"value": "apples"
}
}
},
{
"range": {
"fruit.number": {
"gte": 5
}
}
}
]
}
}
}
}
}
Strictly more than 5 --> gt; >=5 --> gte.
Also notice the inner_hits part -- this gives you the actual nested subdocument which caused this particular basket to match the query. It's not required but good-to-know.
Use case 2:
GET baskets/_search
{
"sort": [
{
"fruit.number": {
"nested_path": "fruit",
"order": "desc"
}
}
]
}
Use case 2 Edit:
There are probably cleaner ways of doing this but I'd go with the following:
GET baskets/_search
{
"size": 0,
"aggs": {
"multiply_and_add": {
"scripted_metric": {
"params": {
"only_fruit_name": "apples"
},
"init_script": "state.by_basket_name = [:]",
"map_script": """
def basket_name = params._source['name'];
def fruits = params._source['fruit'].findAll(group -> group.name == params.only_fruit_name);
for (def fruit_group : fruits) {
def number = fruit_group.number;
if (state.by_basket_name.containsKey(basket_name)) {
state.by_basket_name[basket_name] += number;
} else {
state.by_basket_name[basket_name] = number;
}
}
""",
"combine_script": "return state.by_basket_name",
"reduce_script": "return states"
}
}
}
}
yielding a hash map along the lines of
{
...
"aggregations":{
"multiply_and_add":{
"value":[
{
"basket A":2,
"basket B":6
}
]
}
}
}
Sorting can either be done in the reduce_script or within your ES response post-processing pipeline. You could of course choose to go w/ (sorted) lists and lambdas...
Notice the required nested_path.
After a while of searching and testing, here are (in addition to #joe's answer to use case 2) possible queries for both use cases. Note that both use cases require to change the mapping for the field name to be of type keyword.
Use case 1: Which basket has (strictly) more than 5 apples? Would expect only basket B
For more information on filtering results by their aggregation value see Bucket Selectors
GET /baskets/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name"
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"match": {"fruit.name": "apples"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
},
"basket_sum_filter":{
"bucket_selector":{
"buckets_path":{
"fruitSum":"nest > fruit_filter > fruit_sum"
},
"script":"params.fruitSum > 5"
}
}
}
}
}
}
... yielding
...,
"buckets": [
{
"key": "basket B",
"doc_count": 1,
"nest": {
"doc_count": 2,
"fruit_filter": {
"doc_count": 2,
"fruit_sum": {
"value": 6
}
}
}
}
]
Use case 2: Sort baskets by number of apples. Would expect basket B with a total of 6 apples, then basket A with a total of 2 apples.
GET /baskets/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name",
"order": {"nest > fruit_filter > fruit_sum": "desc"}
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"term": {"fruit.name": "apple"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
}
}
}
}
}
... yielding
...,
"buckets": [
{
"key": "basket B",
"doc_count": 1,
"nest": {
"doc_count": 2,
"fruit_filter": {
"doc_count": 2,
"fruit_sum": {
"value": 6
}
}
}
},
{
"key": "basket A",
"doc_count": 1,
"nest": {
"doc_count": 2,
"fruit_filter": {
"doc_count": 1,
"fruit_sum": {
"value": 2
}
}
}
}
]

Elasticsearch aggregation by field name

Imagine two documents:
[
{
"_id": "abc",
"categories": {
"category-id-1": 1,
"category-id-2": 50
}
},
{
"_id": "def",
"categories": {
"category-id-1": 2
}
}
]
As you can see, each document can be associated with a number of categories, by setting a nested field into the categories field.
With this mapping, I should be able to request the documents from a defined category and to order them by the value set as value for this field.
My problem is that I now want to make an aggregation to count for each category the number of documents. That would give the following result for the dataset I provided:
{
"aggregations": {
"categories" : {
"buckets": [
{
"key": "category-id-1",
"doc_count": 2
},
{
"key": "category-id-2",
"doc_count": 1
}
]
}
}
}
I can't find anything in the documentation to solve this problem. I'm completely new to ElasticSearch so I may be doing something wrong either on my documentation research or on my mapping choice.
Is it possible to make this kind of aggregation with my mapping? I'm using ES 6.x
EDIT: Here is the mapping for the index:
{
"test1234": {
"mappings": {
"_doc": {
"properties": {
"categories": {
"properties": {
"category-id-1": {
"type": "long"
},
"category-id-2": {
"type": "long"
}
}
}
}
}
}
}
}
The most straightforward solution is to use a new field that contains all the distinct categories of a document.
If we call this field categories_list here could be a solution :
Change the mapping to
{
"test1234": {
"mappings": {
"_doc": {
"properties": {
"categories": {
"properties": {
"category-id-1": {
"type": "long"
},
"category-id-2": {
"type": "long"
}
}
},
"categories_list": {
"type": "keyword"
}
}
}
}
}
}
Then you need to modify your documents like this :
[
{
"_id": "abc",
"categories": {
"category-id-1": 1,
"category-id-2": 50
},
"categories_list": ["category-id-1", "category-id-2"]
},
{
"_id": "def",
"categories": {
"category-id-1": 2
},
"categories_list": ["category-id-1"]
}
]
then your aggregation request should be
{
"aggs": {
"categories": {
"terms": {
"field": "categories_list",
"size": 10
}
}
}
}
and will return
"aggregations": {
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category-id-1",
"doc_count": 2
},
{
"key": "category-id-2",
"doc_count": 1
}
]
}
}

Elastic search aggregation on map - on each key

I have following kind of documents.
document 1
{
"doc": {
"id": 1,
"errors": {
"e1":5,
"e2":20,
"e3":30
},
"warnings": {
"w1":1,
"w2":2
}
}
}
document 2
{
"doc": {
"id": 2,
"errors": {
"e1":10
},
"warnings": {
"w1":1,
"w2":2,
"w3":33,
}
}
}
I would like to get following sum stats in one or more calls. Is it possible? I tried various solution but all works when key is known. In my case map keys (e1, e2 etc) are not known.
{
"errors": {
"e1": 15,
"e2": 20,
"e3": 30
},
"warnings": {
"w1": 2,
"w2": 4,
"w3": 33
}
}
There are two solutions, none of them are pretty. I have to point out that the option 2 should be the preferred way to go since option 1 uses an experimental feature.
1. Dynamic mapping, [experimental] scripted aggregation
Inspired by this answer and the Scripted Metric Aggregation page of ES docs, I began with just inserting your documents to non-existing index (which by default creates dynamic mapping).
NB: I tested this on ES 5.4, but the documentation suggests that this feature is available from at least 2.0.
The resulting query for aggregation is the following:
POST /my_index/my_type/_search
{
"size": 0,
"query" : {
"match_all" : {}
},
"aggs": {
"errors": {
"scripted_metric": {
"init_script" : "params._agg.errors = [:]",
"map_script" : "for (t in params['_source']['doc']['errors'].entrySet()) { params._agg.errors[t.key] = t.value } ",
"combine_script" : "return params._agg.errors",
"reduce_script": "Map res = [:] ; for (a in params._aggs) { for (t in a.entrySet()) { res[t.key] = res.containsKey(t.key) ? res[t.key] + t.value : t.value } } return res"
}
},
"warnings": {
"scripted_metric": {
"init_script" : "params._agg.errors = [:]",
"map_script" : "for (t in params['_source']['doc']['warnings'].entrySet()) { params._agg.errors[t.key] = t.value } ",
"combine_script" : "return params._agg.errors",
"reduce_script": "Map res = [:] ; for (a in params._aggs) { for (t in a.entrySet()) { res[t.key] = res.containsKey(t.key) ? res[t.key] + t.value : t.value } } return res"
}
}
}
}
Which produces this output:
{
...
"aggregations": {
"warnings": {
"value": {
"w1": 2,
"w2": 4,
"w3": 33
}
},
"errors": {
"value": {
"e1": 15,
"e2": 20,
"e3": 30
}
}
}
}
If you are following this path you might be interested in the JavaDoc of what params['_source'] is underneath.
Warning: I believe that scripted aggregation is not efficient and for better performance you should check out the option 2 or a different data processing engine.
What does experimental mean:
This functionality is experimental and may be changed or removed
completely in a future release. Elastic will take a best effort
approach to fix any issues, but experimental features are not subject
to the support SLA of official GA features.
With this in mind we proceed to option 2.
2. Static nested mapping, nested aggregation
Here the idea is to store your data differently and essentially be able to query and aggregate it differently. Firstly, we need to create a mapping using nested data type.
PUT /my_index_nested/
{
"mappings": {
"my_type": {
"properties": {
"errors": {
"type": "nested",
"properties": {
"name": {"type": "keyword"},
"val": {"type": "integer"}
}
},
"warnings": {
"type": "nested",
"properties": {
"name": {"type": "keyword"},
"val": {"type": "integer"}
}
}
}
}
}
}
A document in such an index will look like this:
{
"_index": "my_index_nested",
"_type": "my_type",
"_id": "1",
"_score": 1,
"_source": {
"errors": [
{
"name": "e1",
"val": 5
},
{
"name": "e2",
"val": 20
},
{
"name": "e3",
"val": 30
}
],
"warnings": [
{
"name": "w1",
"val": 1
},
{
"name": "w2",
"val": 2
}
]
}
}
Next we need to write the aggregate query. First we need to use nested aggregation, which will allow us to query this special nested data type. But since we actually want to aggregate by name, and sum the values of val, we will need to do a sub-aggregation.
The resulting query is as follows (I am adding comments alongside the query for clarity):
POST /my_index_nested/my_type/_search
{
"size": 0,
"aggs": {
"errors_top": {
"nested": {
// declare which nested objects we want to work with
"path": "errors"
},
"aggs": {
"errors": {
// what we are aggregating - different values of name
"terms": {"field": "errors.name"},
// sub aggregation
"aggs": {
"error_sum": {
// sum all val for same name
"sum": {"field": "errors.val"}
}
}
}
}
},
"warnings_top": {
// analogous to errors
}
}
}
The output of this query will be like:
{
...
"aggregations": {
"errors_top": {
"doc_count": 4,
"errors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "e1",
"doc_count": 2,
"error_sum": {
"value": 15
}
},
{
"key": "e2",
"doc_count": 1,
"error_sum": {
"value": 20
}
},
{
"key": "e3",
"doc_count": 1,
"error_sum": {
"value": 30
}
}
]
}
},
"warnings_top": {
...
}
}
}

Elasticsearch : Is it possible to not analysed aggregation query on analysed field?

I have certain document which stores the brand names in analysed form for ex: {"name":"Sam-sung"} {"name":"Motion:Systems"}. There are cases where i would want to aggregation these brands under timestamp.
my query as follow ,
{
"size": 0,
"aggs": {
"filtered_aggs": {
"filter": {
"range": {
"#timestamp":{
"gte":"2016-07-18T14:23:41.459Z",
"lte":"2016-07-18T14:53:10.017Z"
}
}
},
"aggs": {
"execute_time": {
"terms": {
"field": "brands",
"size": 0
}
}
}
}
}
}
but the return results will be
{
...
"aggregations": {
"states": {
"buckets": [
{
"key": "Sam",
"doc_count": 5
},
{
"key": "sung",
"doc_count": 5
},
{
"key": "Motion",
"doc_count": 1
},
{
"key": "Systems",
"doc_count": 1
}
]
}
}
}
but i want to the results is
{
...
"aggregations": {
"states": {
"buckets": [
{
"key": "Sam-sung",
"doc_count": 5
},
{
"key": "Motion:Systems",
"doc_count": 1
}
]
}
}
}
Is there any way in which i can make not analysed query on analysed field in elastic search?
You need to add a not_analyzed sub-field to your brands fields and then aggregate on that field.
PUT /index/_mapping/type
{
"properties": {
"brands": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Then you need to fully reindex your data in order to populate the new sub-fields brands.raw.
Finally, you can change your query to this:
POST index/_search
{
"size": 0,
"aggs": {
"filtered_aggs": {
"filter": {
"range": {
"#timestamp":{
"gte":"2016-07-18T14:23:41.459Z",
"lte":"2016-07-18T14:53:10.017Z"
}
}
},
"aggs": {
"execute_time": {
"terms": {
"field": "brands.raw",
"size": 0
}
}
}
}
}
}

Resources