Merge / flatten sub aggs into main agg - elasticsearch

Is there away in elasticsearch to get the results back in a sort of flattend form (multiple child/sub aggs?
For instance currently i am trying to get back all product types and their status (online / offline).
This is what i end up with:
aggs
[
{ key: SuperProduct, doc_count:3, subagg:[
{status:online, doc_count:1},
{status:offline, doc_count:2}
]
},
{ key: SuperProduct2, doc_count:10, subagg:[
{status:online, doc_count:7},
{status:offline, doc_count:3}
]
Charting libraries tend to like it flattened so i was wondering if elasticsearch could probide it in this sort of manner:
[
{ products_key: 'SuperProduct', status_key:'online', doc_count:1},
{ products_key: 'SuperProduct', status_key:'offline', doc_count:2},
{ products_key: 'SuperProduct2', status_key:'online', doc_count:7},
{ products_key: 'SuperProduct2', status_key:'offline', doc_count:3}
]
Thanks

It is possible with composite aggregation which you can use to link two terms aggregations:
// POST /i/_search
{
"size": 0,
"aggregations": {
"distribution": {
"composite": {
"sources": [
{"product": {"terms": {"field": "product.keyword"}}},
{"status": {"terms": {"field": "status.keyword"}}}
]
}
}
}
}
This results in following structure:
{
"aggregations": {
"distribution": {
"after_key": {
"product": "B",
"status": "online"
},
"buckets": [
{
"key": {
"product": "A",
"status": "offline"
},
"doc_count": 3
},
{
"key": {
"product": "A",
"status": "online"
},
"doc_count": 2
},
{
"key": {
"product": "B",
"status": "offline"
},
"doc_count": 1
},
{
"key": {
"product": "B",
"status": "online"
},
"doc_count": 4
}
]
}
}
}
If for any reason composite aggregation doesn't fulfill your needs, you can create (via copy_to or by concatenation) or simulate (via scripted fields) field that would uniquely identify bucket. In our project we went with concatenation (partially for the necessity to collapse on this field), e.g. {"bucket": "SuperProductA:online"}, which results in dirtier output (you'll have to decode that field back or use top hits to get original values) but still does the job.

Related

Transform array of values to array of key value pair

I have a json data which is in the form of key and all values in a array but I need to transform it into a array of key value pairs, here is the data
Source data
"2022-08-30T06:58:56.573730Z": [
{ "tag": "AC 3 Phase/7957", "value": 161.37313113545272 },
{ "tag": "AC 3 Phase/7956", "value": 285.46869739695853 }
]
}
Transformation looking for
[
{ "tag": "AC 3 Phase/7957",
"ts": 2022-08-30T06:58:56.573730Z,
"value": 161.37313113545272
},
{ "tag": "AC 3 Phase/7956",
"ts": 2022-08-30T06:58:56.573730Z,
"value": 285.46869739695853
}
]
I would do it like this:
$each($$, function($entries, $ts) {
$entries.{
"tag": tag,
"ts": $ts,
"value": value
}
}) ~> $reduce($append, [])
Feel free to play with this example on the playground: https://stedi.link/g6qJGcP

How to use JSONpath to extract specific values

I'm using JSONpath to try and find data with an array of JSON objects but I'm struggling to get to the information I want. The array contains many objects similar to below where there are values for RecID throughout. If I use $..RecID I get them all when I only want the first Key.RecID of each object (with a value 1338438 in this example). Is there a way to only extract the top level Key.RecID value?
BTW I'm trying to do this in jMeter and I'm assuming JSONpath is the best way to do what I want but if there is a better way I'd be happy to hear about it.
Thanks in advance
[{
"Key": {
"RecID": 1338438
},
"Users": [{
"FullName": "Miss Burns",
"Users": {
"Key": {
"Name": "Burns",
"RecID": 1317474
}
}
},
{
"FullName": "Mrs Fisher",
"Users": {
"Key": {
"Name": "Fisher",
"RecID": 1317904
}
}
}
],
"User": {
"FullName": "Mrs Fisher",
"Key": {
"Name": "Fisher",
"RecID": 1317904
}
},
"Organisation": {
"Key": {
"RecID": 1313881
}
}
}]

How to cleanly batch queries together in Gremlin

I am writing a GraphQL resolver that retrieves all vertices by a particular edge using the following query (created returns label person):
software {
created {
name
}
}
Which would resolve to the following Gremlin Query for each software node found:
g.V().hasLabel('software').has('name', 'ripple').in('created')
This returns a result that includes all properties of the object:
{
"result": [
{
"#type": "d",
"#rid": "#24:0",
"#version": 6,
"#class": "person",
"in_knows": [
"#35:0"
],
"name": "josh",
"out_created": [
"#32:0",
"#33:0"
],
"age": 32,
"#fieldTypes": "in_knows=g,out_created=g"
}
],
"dbStats": {
...
}
}
I realize that this will fall foul on GraphQL's N+1 query so i'm trying to batch queries together using a Dataloader pattern. (i'm also hoping to do property selections, so i'm not asking the database to return too much info)
So i'm trying to craft a query like so:
g.V().union(
__.hasLabel('software').has('name', 'ripple').
project('parent', 'child').by('id').
by(__.in('created').fold()),
__.hasLabel('software').has('name', 'lop').
project('parent', 'child').by('id').
by(__.in('created').fold())
)
But this results in the following where the props are missing and it just includes the id of the vertices I want:
{
"result": [
{
"parent": "ripple",
"child": [
"#24:0"
]
},
{
"parent": "lop",
"child": [
"#22:0",
"#23:0",
"#24:0"
]
}
],
"dbStats": {
...
}
}
My Question is, how can I have the Gremlin query return all of the props for the found vertices and none of the other props? Should I even been doing batching this way?
For anyone else reading, the query I was trying to write wouldn't work because the TraversalSet created in the .by(_.in('created') can't be cast from a List to an ElementMap as the stream cardinality wouldn't be enforced. (You can only have one record per row, I think?)
My working query would be to duplicate the keys for each row and specify the props needed (the query below is ok for gremlin 3.3 as used in ODB, otherwise if you've got < gremlin 3.4 replace the last by step with be(elementMap('name', 'age')):
g.V().union(
__.hasLabel('software').has('name', 'ripple').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value())),
__.hasLabel('software').has('name', 'lop').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value()))
)
So that you get a result like this:
{"data": [
{
"parent": "ripple",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
},
{
"parent": "lop",
"child": {
"id": 5709,
"name": "peter",
"age": 35
}
},
{
"parent": "lop",
"child": {
"id": 5713,
"name": "marko",
"age": 29
}
},
{
"parent": "lop",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
}
]
}
Which would allow you to create a lookup where you concat all results for "lop" and "ripple" into arrays.

Elastic search Average time difference Aggregate Query

I have documents in elasticsearch in which each document looks something like as follows:
{
"id": "T12890ADSA12",
"status": "ENDED",
"type": "SAMPLE",
"updatedAt": "2020-05-29T18:18:08.483Z",
"events": [
{
"event": "STARTED",
"version": 1,
"timestamp": "2020-04-30T13:41:25.862Z"
},
{
"event": "INPROGRESS",
"version": 2,
"timestamp": "2020-05-14T17:03:09.137Z"
},
{
"event": "INPROGRESS",
"version": 3,
"timestamp": "2020-05-17T17:03:09.137Z"
},
{
"event": "ENDED",
"version": 4,
"timestamp": "2020-05-29T18:18:08.483Z"
}
],
"createdAt": "2020-04-30T13:41:25.862Z"
}
Now, I wanted to write a query in elasticsearch to get all the documents which are of type "SAMPLE" and I can get the average time between STARTED and ENDED of all those documents. Eg. Avg of (2020-05-29T18:18:08.483Z - 2020-04-30T13:41:25.862Z, ....). Assume that STARTED and ENDED event is present only once in events array. Is there any way I can do that?
You can do something like this. The query selects the events of type SAMPLE and status ENDED (to make sure there is a ENDED event). Then the avg aggregation uses scripting to gather the STARTED and ENDED timestamps and subtracts them to return the number of days:
POST test/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"status.keyword": "ENDED"
}
},
{
"term": {
"type.keyword": "SAMPLE"
}
}
]
}
},
"aggs": {
"duration": {
"avg": {
"script": "Map findEvent(List events, String type) {return events.find(it -> it.event == type);} def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp); def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp); return ChronoUnit.DAYS.between(started, ended);"
}
}
}
}
The script looks like this:
Map findEvent(List events, String type) {
return events.find(it -> it.event == type);
}
def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp);
def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp);
return ChronoUnit.DAYS.between(started, ended);

Elastic query group by

I've started the process of learning ElasticSearch and I was wondering if somebody could help me shortcut the process by providing some examples of how I would a build couple of queries.
Here's my example schema...
PUT /sales/_mapping
{
"sale": {
"properties": {
"productCode: {"type":"string"},
"productTitle": {"type": "string"},
"quantity" : {"type": "integer"},
"unitPrice" : {"type": double}
}
}
}
POST /sales/1
{"productCode": "A", "productTitle": "Widget", "quantity" : 5, "unitPrice":
5.50}
POST /sales/2
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 10, "unitPrice": 1.10}
POST /sales/3
{"productCode": "C", "productTitle": "Spanner", "quantity" : 5, "unitPrice":
9.00}
POST /sales/4
{"productCode": "A", "productTitle": "Widget", "quantity" : 15, "unitPrice":
5.40}
POST /sales/5
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 20, "unitPrice":
1.00}
POST /sales/6
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 30, "unitPrice":
0.90}
POST /sales/7
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 40, "unitPrice":
0.80}
POST /sales/8
{"productCode": "C", "productTitle": "Spanner", "quantity" : 100,
"unitPrice": 7.50}
POST /sales/9
{"productCode": "C", "productTitle": "Spanner", "quantity" : 200,
"unitPrice": 5.50}
What query would I need to generate the following results?
a). Show the show the number of documents grouped by product code
Product code Title Count
A Widget 2
B Gizmo 4
C Spanner 3
b). Show the total units sold by product code, i.e.
Product code Title Total units sold
A Widget 20
B Gizmo 100
C Spanner 305
TIA
You can accomplish that using aggregations, in particular Terms Aggregations. And it can be done in just one run, by including them within your query structure; in order to instruct ES to generate analytic data based in aggregations, you need to include the aggregations object (or aggs), and specify within it the type of aggregations you would like ES to run upon your data.
{
"query": {
"match_all": {}
},
"aggs": {
"group_by_product": {
"terms": {
"field": "productCode"
},
"aggs": {
"units_sold": {
"sum": {
"field": "quantity"
}
}
}
}
}
}
By running that query, besides the resulting hits from your search (in this case we are doing a match all), and additional object will be included, within the response object, holding the corresponding resulting aggregations. For example
{
...
"hits": {
"total": 6,
"max_score": 1,
"hits": [ ... ]
},
"aggregations": {
"group_by_product": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "b",
"doc_count": 3,
"units_sold": {
"value": 60
}
},
{
"key": "a",
"doc_count": 2,
"units_sold": {
"value": 20
}
},
{
"key": "c",
"doc_count": 1,
"units_sold": {
"value": 5
}
}
]
}
}
}
I omitted some details from the response object for brevity, and to highlight the important part, which is within the aggregations object. You can see how the aggregated data consists of different buckets, each representing the distinct product types (identified by the key key) that were found within your documents, doc_count has the number of occurrences per product type, and the unit_sold object, holds the total sum of units sold per each of the product types.
One important thing to keep into consideration is that in order to perform aggregations on string or text fields, you need to enable the fielddata setting within your field mapping, as that setting is disabled by default on all text based fields. In order to update the mapping, for ex. of the product code field, you just need to to a PUT request to the corresponding mapping type within the index, for example
PUT http://localhost:9200/sales/sale/_mapping
{
"properties": {
"productCode": {
"type": "string",
"fielddata": true
}
}
}
(more info about the fielddata setting)

Resources