Conditional sum metric (sub-total column) in Kibana data table - elasticsearch

I need to display subtotal columns in a Kibana data table. Not filtering the entire table, but only certain columns.
I've seen posts about doing conditional counts in a metric's JSON input field:
{
"script":{
"inline": "doc['SomeField'].value == 'SomeValue' ? 1 : 0",
"lang": "painless"
}
}
But no reference to conditional sums of numeric data. My loosely expressed need:
sum(btyes) where category = [write]
Alternatively, the Kibana Enhanced Table plugin was suggested as a way to implement computed columns.
Is it possible to achieve conditional sums using JSON input on a specific data table metric? Is anyone using the plugin? Should it be done upstream in an elasticsearch index? What is best practice?

Solution is a simple change to show the actual value in the true condition, rather than a 1 for counting :
{
"script" : "doc['category.keyword'].value == 'write' ? doc['bytes'].value : 0"
}

Related

Represent enum in Elastic Search for sorting

I have a use case to represent an enum for difficulty level (EASY, MEDIUM, DIFFICULT) in elastic search with support of sorting on this field. If this field is indexed as string the sorting will not work as expected.
One way to support this is to index integer values for each enumeration in ES and map it to string values when sorted results are returned by ES.
Are there other alternatives such that ES itself takes care of sorting in the enumeration order while this field is indexed as string? Can I specify custom sort function for a field? function_score is an option, but given that I have to sort based on enum ordering is there better way than defining custom function_score?
In my use case there are multiple such enumeration defining scale across dimensions like difficulty, height (low, medium, high), grades (good, average, poor), etc. Both the above solution requires custom work as a new dimension is introduced. Can either of the above approach be generalzied?
You can check the answer to the same question here. You will need to use script_score like below:
GET /my-index-2/_search
{
"query": {
"script_score": {
"query": {
"match_all":{}
},
"script": {
"source": "if (doc['field name'].value == 'EASY'){2} else if(doc['field name'].value == 'MEDIUM') {1} else if(doc['field name'].value == 'DIFFICULT') {0}"
}
}
}
}

Sorting by product price considering special prices (client, group, country)

we have a shop with a few products (~ 5000).
There are, of course, category overview sites which show all products that are in the current category. A requirement is that all products can be sorted by price (ASC and DESC).
This already works (partially), because the problem is, in our Elasticsearch, we currently only have the "original" price, so any product discounts are not considered and therefore the sorting does not work correctly.
My task is it now to fix that.
But I am already struggling with "how to" persist the "special prices" into Elasticsearch.
The problem is every product can be discounted in general, on a customer level, on a customer group level and on a country level.
So I imagine a structure like this would be a start:
# current
{
"articleNumber": "12345",
...
"price": 9.99,
...
}
# new
{
"articleNumber": "12345",
...
"price": 9.99,
...
"special_prices": [
{
"customer": "123456",
"client_price": 5.99,
"client_group_price": null,
"country_de": null
"country_es": null,
...
},
...
]
}
Following thoughts:
The specials prices could be stored as a nested object inside the product index (but I am not sure how to do the sorting on it later)
Maybe I could create a second index with prices, then I would have two queries, but I guess that would be ok? Because I have to build a whole matrix with every customer we have (also ~5000), with every product with every possible price. But if I would have a second index then I would have to join and maybe the sorting is incorrect then
If possible, I would like to only persist any prices if a product has a special price and if not, I don't want to blow up the index
I tried something with painless to return the special price if one exists for the product and customer, but this gives me this:
...
"script": "if (doc['special_prices.customer'] != null && doc['special_prices.customer'].value == '123456') { return 12.45; } else { return doc['price']; }",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [special_prices.customer] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
...
Maybe something like SQL ORDER BY CASE WHEN would be an option?
Any ideas on how I should model and persist the special prices? And how can I achieve the sorting?
Is joining a second index a good idea?
Best regards
The error you see is because special_prices.customer is not indexed as keyword, and instead is a text (which allows full-text search). If you didn't specify mapping explicitly, Elasticsearch most likely created a keyword for you. Just try to replace special_prices.customer with special_prices.customer.keyword in your script.
The idea of using a script for sorting is good, given that you only have 5000 documents. Scripts do not have good performance, but in your case this might not matter.
In general this looks like a tough case, because you need some kind of joining between products and prices, and Elasticsearch is not good at joins. It has got some joining options: nested datatype, join datatype (a.k.a. parent-child), and denormalization. The last one you have already considered - when you put different prices in the original product document.
Unfortunately I can't recommend one over another, because there is no single recipe. I would try with scripts, and if performance is not good enough consider remodelling the data.
Hope that helps!

How to combine 2 fiels with different names but same value in Kibana

I set up a ELK environment with 2 different indices(index) streams. both streams have a field with the same value but the filed name is different.
is there a possibility to merge them or something like that so when i use the Kibana filter it shows me the value from both filed.
so i can set up visualizations, but when i filter on stream 1, the visualization of stream 2 is empty.
i also tried to name the indedx the same, but did not help.
Example:
index1 fieldname information.ID = 123
index2 fieldname ID = 123
i want to use the filter on both streams
You can create an alias which contains both indexes. Write a query against the alias using script fields.
In script fields you can define the new field and its source logic from underlying documents.
"script_fields": {
"my_script_field": {
"script": {
"lang": "painless",
"inline": "doc['some_field'].value + doc['another_field'].value"
}
In this way the resultset will have single field “my_script_field”

Kibana scripted field which loops through an array

I am trying to use the metricbeat http module to monitor F5 pools.
I make a request to the f5 api and bring back json, which is saved to kibana. But the json contains an array of pool members and I want to count the number which are up.
The advice seems to be that this can be done with a scripted field. However, I can't get the script to retrieve the array. eg
doc['http.f5pools.items.monitor'].value.length()
returns in the preview results with the same 'Additional Field' added for comparison:
[
{
"_id": "rT7wdGsBXQSGm_pQoH6Y",
"http": {
"f5pools": {
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
},
"pool.MemberCount": [
7
]
},
If I try
doc['http.f5pools.items']
Or similar I just get an error:
"reason": "No field found for [http.f5pools.items] in mapping with types []"
Googling suggests that the doc construct does not contain arrays?
Is it possible to make a scripted field which can access the set of values? ie is my code or the way I'm indexing the data wrong.
If not is there an alternative approach within metricbeats? I don't want to have to make a whole new api to do the calculation and add a separate field
-- update.
Weirdly it seems that the number values in the array do return the expected results. ie.
doc['http.f5pools.items.ratio']
returns
{
"_id": "BT6WdWsBXQSGm_pQBbCa",
"pool.MemberCount": [
1,
1
]
},
-- update 2
Ok, so if the strings in the field have different values then you get all the values. if they are the same you just get one. wtf?
I'm adding another answer instead of deleting my previous one which is not the actual question but still may be helpful for someone else in future.
I found a hint in the same documentation:
Doc values are a columnar field value store
Upon googling this further I found this Doc Value Intro which says that the doc values are essentially "uninverted index" useful for operations like sorting; my hypotheses is while sorting you essentially dont want same values repeated and hence the data structure they use removes those duplicates. That still did not answer as to why it works different for string than number. Numbers are preserved but strings are filters into unique.
This “uninverted” structure is often called a “column-store” in other
systems. Essentially, it stores all the values for a single field
together in a single column of data, which makes it very efficient for
operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is
enabled by default. Doc values are created at index-time: when a field
is indexed, Elasticsearch adds the tokens to the inverted index for
search. But it also extracts the terms and adds them to the columnar
doc values.
Some more deep-dive into doc values revealed it a compression technique which actually de-deuplicates the values for efficient and memory-friendly operations.
Here's a NOTE given on the link above which answers the question:
You may be thinking "Well that’s great for numbers, but what about
strings?" Strings are encoded similarly, with the help of an ordinal
table. The strings are de-duplicated and sorted into a table, assigned
an ID, and then those ID’s are used as numeric doc values. Which means
strings enjoy many of the same compression benefits that numerics do.
The ordinal table itself has some compression tricks, such as using
fixed, variable or prefix-encoded strings.
Also, if you dont want this behavior then you can disable doc-values
OK, solved it.
https://discuss.elastic.co/t/problem-looping-through-array-in-each-doc-with-painless/90648
So as I discovered arrays are prefiltered to only return distinct values (except in the case of ints apparently?)
The solution is to use params._source instead of doc[]
The answer for why doc doesnt work
Quoting below:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
Doc-values can only return "simple" field values like numbers, dates,
geo- points, terms, etc, or arrays of these values if the field is
multi-valued. It cannot return JSON objects
Also, important to add a null check as mentioned below:
Missing fields
The doc['field'] will throw an error if field is
missing from the mappings. In painless, a check can first be done with
doc.containsKey('field')* to guard accessing the doc map.
Unfortunately, there is no way to check for the existence of the field
in mappings in an expression script.
Also, here is why _source works
Quoting below:
The document _source, which is really just a special stored field, can
be accessed using the _source.field_name syntax. The _source is loaded
as a map-of-maps, so properties within object fields can be accessed
as, for example, _source.name.first.
.
Responding to your comment with an example:
The kyeword here is: It cannot return JSON objects. The field doc['http.f5pools.items'] is a JSON object
Try running below and see the mapping it creates:
PUT t5/doc/2
{
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
GET t5/_mapping
{
"t5" : {
"mappings" : {
"doc" : {
"properties" : {
"items" : {
"properties" : {
"monitor" : { <-- monitor is a property of items property(Object)
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}

ElasticSearch - Unique Tags for multiple documents (indexing)

We would like a unique Tag and multiple values in elastic search : to be clearer. We need to do a timeserie graph. So we get values between 2 dates. But of course we have different kinds of data. That where our tags comes. We want to search our tags with an autoCompletion, then choose our values with the dates.
{tag :["sdfsf", "fddsfsd", "fsdfsf"]
{
values : 145.45
date : "2004-10-23"
},
{
values : 556.09
date : "2010-02-13"
}
}
After, a bit of research we found the parent/child technique but because we want to do a completion on tag (in the parent), we need an aggregation which is impossible in ES with "has_parent".
Our solutions is to do :
{
{
tag :["sdfsf", "fddsfsd", "fsdfsf"],
values : 145.45,
date : "2004-10-23"
},
{
tag :null,
values : 556.09,
date : "2010-02-13"
}, {etc...}
}
So we only have one tag easy to check with completion. But it's kind of "ugly".
Does anybody have a correct way to do what we want to do ?
thx in advance

Resources