Sum field based upon the unique combination of 2 others - elasticsearch

I'm searching for a way to accomplish the following:
I've got a date histogram interval that is by "day" and i'd like to add the parent_size into a sum only once even if the related associations.c occurs many times within that date histogram interval. Ideally I'd to be able to perform this query with any date histogram interval using the same "sum parent_size once for each unique associations.c per day" logic.
Below is an example document in the index I'm querying against:
{
"associations": {
"a": [
"2514519s-f379-11e3-ae2b-3176bd53680f"
],
"b": [
"5e8a07af-d2d3-4a1c-ba43-07f5cfc0eb8d"
],
"c": [
"6bda18ag-f379-11e3-ae2b-3176bd53680f"
]
},
"parent_size": 110,
"id": "d5fe6216-7eb7-4d81-b3b2-eef28850b80d",
"created_at": "2016-05-23T23:51:17.661Z"
}

Related

Reorder object hierarchy and group by time in JSONata

Although I'm not a total JSONata noob, I'm having a hard time finding an elegant solution to the following desired transformation. The starting point is a set of time-series data in a format like this:
{
"series1": {
"data": [
{"time": "2022-01-01T00:00:00Z", "value": 22},
{"time": "2022-01-02T00:00:00Z", "value": 23}
]
},
"series2": {
"data": [
{"time": "2022-01-01T00:00:00Z","value": 220},
{"time": "2022-01-02T00:00:00Z","value": 230}
]
}
}
I need to "flip the hierarchy", and group these datapoints by timestamp, into an array of objects, like follows:
[
{
"time": "2022-01-01T00:00:00Z",
"series1": 22,
"series2": 220
},
{
"time": "2022-01-02T00:00:00Z",
"series1": 23,
"series2": 230
}
]
I currently have this working with the expression
$each($, function($v, $s) {
[$v.data.{
'series': $s,
'time':$.time,
'value': $.value
}]
}).*{
`time`: {
`series`: value
}
}
~> $each(function($v, $t) {
$merge([
$v,
{'time': $t}
])
})
(playground link: https://try.jsonata.org/8CaggujJk)
...and...I can't help but feel that there must be a better way!
For reference, my current expression basically does this in three consecutive steps:
The first $each() function, which splits up the original object into an array of datapoints, with a series name, timestamp, and value of each.
A grouping operator which makes time a key, and gathers all values for a given timestamp together.
A second $each() function, which transforms the object into an array of objects where time is a value again, rather than a key - and merges the time key-value alongside the series values.
I've seen some wonderfully elegant solutions to similar problems on here, but am not sure how to approach this in a better way. Any tips appreciated!

How can I show a table with the sum of value x of all childeren within Kibana

I'm have an elasticsearch database with documents stored the following way(, seperates the documents):
{
"path":"path/to/data"
"kind": "type1"
},
{
"path":"path/to/data/values1"
"kind": "type2"
"x": 2
},
{
"path":"path/to/data/values2"
"kind": "type2"
"x": 2
},
{
"path":"path/to/data/datasub"
"kind": "type1"
},
{
"path":"path/to/data/datasub/values1"
"kind": "type2"
"x": 1
}
Now I want the create table view/chart show all type2's with all the sum of x of all their childeren.
So I expect the total of path/to/data to be 5 and the total of path/to/data/datasub 1.
To consider: the depth of this structure could theoretically be unlimited
I'm running Elastichsearch 7 and Kibana 7 and I want to use the table visualisation to start with but I would like to be able to use this kind of aggregation throughout multiple visualisations. I have Googles a lot and found all kinds of Elastichsearch queries but nothing on how to achieve this in Kibana.
All help is much appreciated
For those who run into the same question:
The solution I ended up using is to split the path in to tokens prior to importing it into Elasticsearch. So consider a document having a path like "/this/is/a/path". This becomes the following array in the document:
[
"/this",
"/this/is",
"/this/is/a",
"/this/is/a/path"
]
You can then use a terms aggregation on it with various metrics to calculate your desired measurements.

Elasicsearch sort by inner field

I have documents that one of their field looks like the following -
"ingredients": [{
"unit": "MG",
"value": 123,
"key": "abc"
}]
And I would like to sort the different records according to the ascending value of specific ingredient. That is if I have 2 records which have use ingredient with key "abc", one with value 1 and one with value 2. The one with ingredient value 1 should appear first.
Each of those records may have more than on ingredient.
Thank you in advance!
The search query to sort will be:
{
"sort":{
"ingredients.value":{
"order":"asc"}
}}

Elasticsearch: how to know which field the results are sorted by?

In Elasticsearch, is there any way to check which field the results are sorted by? I want something like inner-hits for sort clause.
Imagine that your documents have this kind of form:
{"numerals" : [ // nested
{"key": "point", "value": 30},
{"key": "points", "value": 200},
{"key": "score", "value": 20},
{"key": "scores", "value": 40}
]
}
and you sort the results by:
{"numerals.value": {
"nested_path": "numerals",
"nested_filter": {
"match": {
"numerals.key": "score"}}}}
Now I have no idea how to know the field by which the results are actually sorted: it's probably scores at this document, but is perhaps score at the others? There are 2 problems - 1. You cannot use inner-hits nor highlight for the nested fields. and - 2. Even if you can, it doesn't solve the issue if there are multiple matching candidates.
The question is about sorting by fields that are inside nested objects.
So this is what the documention
https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-sorting.html
and
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#_nested_sorting_example
says:
Elasticsearch will first restrict the nested documents by the "nested_filter"-query and then sort on the same way as for multi-valued fields:
Exactly the way as if there would be only the filtered nested documents as inner objects aka as if there would be only the root document with a multi-valued field which contains exactly all value which belong to the filtered nested objects
( in your example there will only one value remain: 20).
If you want to be sure about the sort order insert a "mode" parameter:
"min", "max", "sum", "avg" or "median"
If you do not specify the "mode" parameter according to the corresponding issue the min-value will be picked for "asc" and the max-value will be picked for "desc"-order:
By default when sorting on a multi-valued field the lowest or highest
value will be picked from the field values depending on the sort
order.

Require a number of matches against text in ElasticSearch

I'm trying to create a filter against ElasticSearch that requires more than one match before the result is returned. For example, in the following text:
If you're uneasy at the idea of riding in a vehicle that drives itself, just wait till you see Google's new car. It has no gas pedal, no brake and no steering wheel. Google has been demonstrating its driverless technology for several years by retrofitting Toyotas, Lexuses and other cars with cameras and sensors. But now, for the first time, the company has unveiled a prototype of its own: a cute little car that looks like a cross between a VW Beetle and a golf cart.
If I set the minimum number of matches to 2 and searched for Google, I would expect this result because Google appears in the text twice. However, searching on Toyota with the same number of expected matches should not result in this article.
How do I construct this filter?
Probably not exactly what you are looking for, but you could add explain to your query and then filter on the client side by number of term matches. From the docs, query would look like this:
GET /_search?explain
{
"query" : { "match" : { "tweet" : "honeymoon" }}
}
Results would look like this:
"_explanation": {
"description": "weight(tweet:honeymoon in 0)
[PerFieldSimilarity], result of:",
"value": 0.076713204,
"details": [
{
"description": "fieldWeight in 0, product of:",
"value": 0.076713204,
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"value": 1,
"details": [
{
"description": "termFreq=1.0",
"value": 1
}
]
},
{
"description": "idf(docFreq=1, maxDocs=1)",
"value": 0.30685282
},
{
"description": "fieldNorm(doc=0)",
"value": 0.25,
}
]
}
]
}
You could then filter on the description field for term frequency and look for a value > 1.
I believe you may be able to do this directly (no client side filtering) by using scripting, as you can get reference to term frequency:
Term statistics:
Term statistics for a field can be accessed with a subscript operator like this: _index['FIELD']['TERM']. This will never return null, even if term or field does not exist. If you do not need the term frequency, call _index['FIELD'].get('TERM', 0) to avoid uneccesary initialization of the frequencies. The flag will have only affect is your set the index_options to docs (see mapping documentation).
_index['FIELD']['TERM'].df()
df of term TERM in field FIELD. Will be returned, even if the term is not present in the current document.
_index['FIELD']['TERM'].ttf()
The sum of term frequencys of term TERM in field FIELD over all documents. Will be returned, even if the term is not present in the current document.
_index['FIELD']['TERM'].tf()
tf of term TERM in field FIELD. Will be 0 if the term is not present in the current document.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html
However, I've not done this and there are the normal concerns about both security and performance when using server side scripting.

Resources