Pipeline aggregations in Elasticsearch - elasticsearch

I am working on Elasticsearch Aggregation and have a question regarding how to do pipeline sort of aggregation. I have three high-level fields in my ES document:
documentId, list1, list2
Example:
This is the couple of documents I have:
document 1:
{
"documentId":"1",
"list1":
[
{
"key": "key1",
"value": "value11"
}
],
"list2":
[
{
"key": "key2",
"value": "value21"
}
...
]
}
document 2:
{
"documentId":"2",
"list1":
[
{
"key": "key1",
"value": "value11"
}
],
"list2":
[
{
"key": "key2",
"value": "value21"
}
...
]
}
document 3:
{
"documentId":"3",
"list1":
[
{
"key": "key1",
"value": "value12"
}
],
"list2":
[
{
"key": "key2",
"value": "value21"
}
...
]
}
To summarize -
document1 and document2 has same set of values for key1 and key2 (Except id is different, so they are treated two separate documents).
document3 has same value for key2 as in document1 and document2. Value for key1 is different from document1 and document2.
I want to run terms aggregator on keys of list1 field which should go as input into terms aggregation done on list2.
So, for the above example, the overall output I want is -
value21: 2
(one count corresponding to value11 in key1 and second count corresponding to value12 in key1)
and NOT
value21: 3 (two counts corresponding to value11 in key1 and third count corresponding to value12 in key1).
Is there any simple way of doing this?

Related

Grafana query filter by key and value

In my Grafana ElasticSearch Datasource, I have an attribute like this:
=== object_attributes.variables ====
[
{ "key": "fruit", "value": "apple" },
{ "key": "fruit", "value": "banana" },
{ "key": "game", "value": "cricket" },
{ "key": "game", "value": "football" }
]
=== object_attributes.status =====
["failed","all","xxx"] or ["passed","all","xxx"]
So, When I Query like this
* AND object_attributes.status:"passed" I get Expected Results
* AND object_attributes.status:"passed" AND object_attributes.variable:{ "key": "fruit", "value": "banana" } I get no results.
Basically, I want to filter all attributes by fruit: banana and passed. So, How Do I modify point 2 to get results?
I figured it out by myself.
I was running the Query below and I was able to get the results as expected.
* AND object_attributes.status:"passed" AND object_attributes.variable.key:"fruit" AND object_attributes.variable.value:"banana"
So, Instead of Running object_attributes.variable:{ "key": "fruit", "value": "banana" } I was running both object_attributes.variable.key:"fruit" AND object_attributes.variable.value:"banana" and its working like a charm.

Transform array of values to array of key value pair

I have a json data which is in the form of key and all values in a array but I need to transform it into a array of key value pairs, here is the data
Source data
"2022-08-30T06:58:56.573730Z": [
{ "tag": "AC 3 Phase/7957", "value": 161.37313113545272 },
{ "tag": "AC 3 Phase/7956", "value": 285.46869739695853 }
]
}
Transformation looking for
[
{ "tag": "AC 3 Phase/7957",
"ts": 2022-08-30T06:58:56.573730Z,
"value": 161.37313113545272
},
{ "tag": "AC 3 Phase/7956",
"ts": 2022-08-30T06:58:56.573730Z,
"value": 285.46869739695853
}
]
I would do it like this:
$each($$, function($entries, $ts) {
$entries.{
"tag": tag,
"ts": $ts,
"value": value
}
}) ~> $reduce($append, [])
Feel free to play with this example on the playground: https://stedi.link/g6qJGcP

NiFi Jolt Specification for array input

I have the following input in Nifi Jolt Specification processor:
[
{
"values": [
{
"id": "paramA",
"value": 1
}
]
},
{
"values": [
{
"id": "paramB",
"value": 3
}
]
}
]
Expected output:
[
{
"id": "paramA",
"value": 1
},
{
"id": "paramB",
"value": 2
}
]
Can you explain how I have to do?
thanks in advance
You want to reach the objects of the values array which are nested within seperate object signs ({}). A "*" notation is needed in order to cross them over per each individual values array, and then use another "*" notation for indexes of those arrays while choosing "" as the counterpart values in order to grab nothing but the sub-objects such as
[
{
"operation": "shift",
"spec": {
"*": {
"values": {
"*": ""
}
}
}
}
]

Elasticsearch to return documents based on 2 criteria where one is based on the other

I have documents in the following format:
{
"id": number
"chefId: number
"name": String,
"ingredients": List<String>,
"isSpecial": boolean
}
Here is a list of 5 documents:
{
"id": 1,
"chefId": 1,
"name": "Roasted Potatoes",
"ingredients": ["Potato", "Onion", "Oil", "Salt"],
"isSpecial": false
},
{
"id": 2,
"chefId": 1,
"name": "Dauphinoise potatoes",
"ingredients": ["Potato", "Garlic", "Cream", "Salt"],
"isSpecial": true
},
{
"id": 3,
"chefId": 2,
"name": "Boiled Potatoes",
"ingredients": ["Potato", "Salt"],
"isSpecial": true
},
{
"id": 4,
"chefId": 3
"name": "Mashed Potatoes",
"ingredients": ["Potato", "Butter", "Milk"],
"isSpecial": false
},
{
"id": 5,
"chefId": 4
"name": "Hash Browns",
"ingredients": ["Potato", "Onion", "Egg"],
"isSpecial": false
}
I will be doing a search where "Potatoes" is contained in the name field. Like this:
{
"query": {
"wildcard": {
"status": {
"value": "*Potatoes*"
}
}
}
}
But I also want to add some extra criteria when returning documents:
If the ingredients contain onion or milk, then return the documents. So documents with the id 1 and 4 will be returned. Note that this means that we have documents returned where chef ids are 1 and 3.
Then, for the documents where we haven't already got another document with the same chef id, return where the isSpecial flag is set to true. So only document 3 will be returned. 2 wouldn't be returned as we already have a document where the chef id is equal to one.
Is it possible to do this kind of chaining in Elasticsearch? I would like to be able to do this in a single query so that I can avoid adding logic to my (Java) code.
You can't have that sort of logic in one elasticsearch query. You could have a tricky query with aggregations / post_filter and so to have all the data you need in one query and then transform it in your Java application.
But the best approach (and the more maintainable) is to have two queries.

RethinkDB: Equivalent for "select where field not in (items)"

I have a table that looks like this:
[
{ "name": "Alpha", "values": {
"someProperty": 1
}},
{ "name": "Beta", "values": {
"someProperty": 2
}},
{ "name": "Gamma", "values": {
"someProperty": 3
}}
]
I want to select all records where someProperty is not in some array of values (e.g., all records where someProperty not in [1, 2]). I want to get back complete records, not just the values of someProperty.
How should I do this with RethinkDB?
In python it would be:
table.filter(lambda doc: r.not(r.expr([1,2]).contains(doc["someProperty"]))
If the array comes from a subquery and you don't want to do it multiple times:
subquery.do(lambda array:
table.filter(lambda doc: r.not(array.contains(doc["someProperty"]))))

Resources