Kibana scripted field which loops through an array - elasticsearch

I am trying to use the metricbeat http module to monitor F5 pools.
I make a request to the f5 api and bring back json, which is saved to kibana. But the json contains an array of pool members and I want to count the number which are up.
The advice seems to be that this can be done with a scripted field. However, I can't get the script to retrieve the array. eg
doc['http.f5pools.items.monitor'].value.length()
returns in the preview results with the same 'Additional Field' added for comparison:
[
{
"_id": "rT7wdGsBXQSGm_pQoH6Y",
"http": {
"f5pools": {
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
},
"pool.MemberCount": [
7
]
},
If I try
doc['http.f5pools.items']
Or similar I just get an error:
"reason": "No field found for [http.f5pools.items] in mapping with types []"
Googling suggests that the doc construct does not contain arrays?
Is it possible to make a scripted field which can access the set of values? ie is my code or the way I'm indexing the data wrong.
If not is there an alternative approach within metricbeats? I don't want to have to make a whole new api to do the calculation and add a separate field
-- update.
Weirdly it seems that the number values in the array do return the expected results. ie.
doc['http.f5pools.items.ratio']
returns
{
"_id": "BT6WdWsBXQSGm_pQBbCa",
"pool.MemberCount": [
1,
1
]
},
-- update 2
Ok, so if the strings in the field have different values then you get all the values. if they are the same you just get one. wtf?

I'm adding another answer instead of deleting my previous one which is not the actual question but still may be helpful for someone else in future.
I found a hint in the same documentation:
Doc values are a columnar field value store
Upon googling this further I found this Doc Value Intro which says that the doc values are essentially "uninverted index" useful for operations like sorting; my hypotheses is while sorting you essentially dont want same values repeated and hence the data structure they use removes those duplicates. That still did not answer as to why it works different for string than number. Numbers are preserved but strings are filters into unique.
This “uninverted” structure is often called a “column-store” in other
systems. Essentially, it stores all the values for a single field
together in a single column of data, which makes it very efficient for
operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is
enabled by default. Doc values are created at index-time: when a field
is indexed, Elasticsearch adds the tokens to the inverted index for
search. But it also extracts the terms and adds them to the columnar
doc values.
Some more deep-dive into doc values revealed it a compression technique which actually de-deuplicates the values for efficient and memory-friendly operations.
Here's a NOTE given on the link above which answers the question:
You may be thinking "Well that’s great for numbers, but what about
strings?" Strings are encoded similarly, with the help of an ordinal
table. The strings are de-duplicated and sorted into a table, assigned
an ID, and then those ID’s are used as numeric doc values. Which means
strings enjoy many of the same compression benefits that numerics do.
The ordinal table itself has some compression tricks, such as using
fixed, variable or prefix-encoded strings.
Also, if you dont want this behavior then you can disable doc-values

OK, solved it.
https://discuss.elastic.co/t/problem-looping-through-array-in-each-doc-with-painless/90648
So as I discovered arrays are prefiltered to only return distinct values (except in the case of ints apparently?)
The solution is to use params._source instead of doc[]

The answer for why doc doesnt work
Quoting below:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
Doc-values can only return "simple" field values like numbers, dates,
geo- points, terms, etc, or arrays of these values if the field is
multi-valued. It cannot return JSON objects
Also, important to add a null check as mentioned below:
Missing fields
The doc['field'] will throw an error if field is
missing from the mappings. In painless, a check can first be done with
doc.containsKey('field')* to guard accessing the doc map.
Unfortunately, there is no way to check for the existence of the field
in mappings in an expression script.
Also, here is why _source works
Quoting below:
The document _source, which is really just a special stored field, can
be accessed using the _source.field_name syntax. The _source is loaded
as a map-of-maps, so properties within object fields can be accessed
as, for example, _source.name.first.
.
Responding to your comment with an example:
The kyeword here is: It cannot return JSON objects. The field doc['http.f5pools.items'] is a JSON object
Try running below and see the mapping it creates:
PUT t5/doc/2
{
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
GET t5/_mapping
{
"t5" : {
"mappings" : {
"doc" : {
"properties" : {
"items" : {
"properties" : {
"monitor" : { <-- monitor is a property of items property(Object)
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}

Related

Sorting Results efficiently by non mapped/searchable Object properties

In my current index called items I have previously sorted the results of my queries by some property values of a specifiy object property, lets call this property just oprop. After I have realized, that with each new key in oprop the total number of used fields on the items index increased, I had to change the mapping on the index. So I set oprop's mapping to dynamic : false, so the keys of oprop are not searchable anymore (not indexed).
Now in some of my queries I need to sort the results on the items index by the values of oprop keys. I don't know how ElasticSearch still can give me the possibility to sort on these key values.
Do I need to use scripts for sorting? Do I have access on non indexed data when using scripts?
Somehow I don't think that this is a good approach and I think that in long term I will run into performance issues.
You could use scripts for sorting, since the data will be stored in the _source field, but that should probably be a last resort. If you know which fields need to be sortable, you could just add those to the mapping, and keep oprop as a non dynamic field otherwise?
"properties": {
"oprop": {
"dynamic": false,
"properties": {
"sortable_key_1": {
"type": "text"
},
"sortable_key_2": {
"type": "text"
}
}
}
}

Protocol buffers Fieldmask on Collections within resource

If I want to update the "amount" field within a particular element inside "f_units" collection in the below resource (protocol buffer), how will the FieldMask look like to update the amount field? Does the field mask operate on array index for collections?
{
"f_sel": {
"f_units": [
{
"id": "1",
"amount": {
"coefficient": 1000,
"exponent": -2
}
},
{
"id": "2",
"amount": {
"coefficient": 2000,
"exponent": -2
}
}
]
}
}
Will it be "f_sel.f_units.0.amount" ? How can I update the amount using FieldMask?
As far as I know, there is no way to replace individual elements of a repeated field with an index in a FieldMask.
Instead, you'd update the amount field for the element within f_units you wish to change and set the FieldMask to
"f_sel.f_units"
It would be slightly more efficient to only have to send a delta to the original list, but it would be hard to prevent bugs. For example, what if the proto was modified in the meantime and the specified index (presuming there was a way to specify one) for the repeated field was no longer in range?
As an aside, Google does propose the concept of MergeOptions which defines semantics for how repeated fields are to be handled when merging. Currently, it appears they intend for you either to replace the repeated field in its entirety or append to the end of the destination field. Both of these merging strategies avoid the aforementioned bug that could be caused by specifying an invalid index.

How to treat certain field values as null in `Elasticsearch`

I'm parsing log files which for simplicity's sake let's say will have the following format :
{"message": "hello world", "size": 100, "forward-to": 127.0.0.1}
I'm indexing these lines into an Elasticsearch index, where I've defined a custom mapping such that message, size, and forward-to are of type text, integer, and ip respectively. However, some log lines will look like this :
{"message": "hello world", "size": "-", "forward-to": ""}
This leads to parsing errors when Elasticsearch tries to index these documents. For technical reasons, it's very much untrivial for me to pre-process these documents and change "-" and "" to null. Is there anyway to define which values my mapping should treat as null ? Is there perhaps an analyzer I can write which works on any field type whatsoever that I can add to all entries in my mapping ?
Basically I'm looking for somewhat of the opposite of the null_value option. Instead of telling Elasticsearch what to turn a null_value into, I'd like to tell it what it should turn into a null_value. Also acceptable would be a way to tell Elasticsearch to simply ignore fields that look a certain way but still parse the other fields in the document.
So this one's easy apparently. Add the following to your mapping settings :
{
"settings": {
"index": {
"mapping": {
"ignore_malformed": "true"
}
}
}
}
This will still index the field (contrary to what I've understood from the documentation...) but it will be ignored during aggregations (so if you have 3 entries in an integer field that are "1", 3, and "hello world", an averaging aggregation will yield 2).
Keep in mind that because of the way the option was implemented (and I would say this is a bug) this still fails for and object that is entered as a concrete value and vice versa. If you'd like to get around that you can set the field's enabled value to false like this :
{
"mappings": {
"my_mapping_name": {
"properties": {
"my_unpredictable_field": {
"enabled": false
}
}
}
}
}
This comes at a price though, since this means the field won't be indexed, but the values entered will be still be stored so you can still accessing them by searching for that document through another field. This usually shouldn't be an issue as you likely won't be filtering documents based on the value of such an unpredictable field, but that depends on your specific case use. See here for the official discussion of this issue.

How do I make a field have varying type in Elastic Search

I need a field, here score, to be a number, and other times a string. Like:
{
"name": "Joe"
"score": 32.5
}
{
"name": "Sue"
"score": "NOT_AVAILABLE"
}
How can I express this in this in the index settings in Elastic Search?
I basically want "dynamic typing" on the field. The code may not make sense to you (like: why not split it into 2 different fields), but it's necessary to be this way on my end (for consistency reasons).
I don't need/want the property to be indexed/"searchable" btw. I just need the data to be in the json response. I need something like "any object will fit here".
Finally figured it out. All I had to do was to set enabled to false, and elastic search will not attempt to do anything with the data - but it's still present in the json response.
Like so:
"score": {
"enabled": false
}
Just define "score" field to be of type "string" in your mapping. That's it. Make sure you do define the mapping before indexing any document though. Otherwise if you let the mapping be created dynamically and the type of value of "score" field is anything but string in the first document you index, you won't be able to index any document next in which "score" holds a string.

ElasticSearch - Statistical facet on length of string field

I would like to retrieve data about a string field like the min, max and average length (by counting the number of characters inside the string). My issue is that aggregations can only be used for numeric fields. Besides, I tried it using a simple statistical facet,
"query":{
"match_all": {}
},
"facets":{
"stat1":{
"statistical":{
"field":"title"}
}
}
but I get shard failures and SearchPhaseExecutionException. When trying with a script field the error returned is an OutOfMemoryError:
"query":{
"match_all": {}
},
"script_fields":{
"test1":{"script": "doc[\"title\"].value" }
}
Is it possible to retrive such data about a simple "title" string field using CURL? Thank you!
I haven't actually tried the following, but I believe it should work.
First some useful doc-references:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-statistical-facet.html.
In order to implement the statistical facet, the relevant field values
are loaded into memory from the index. This means that per shard,
there should be enough memory to contain them. Since by default,
dynamic introduced types are long and double, one option to reduce the
memory footprint is to explicitly set the types for the relevant
fields to either short, integer, or float when possible.
I'm not sure directly how to set the type of the script-field to 'short' which is probably what you want. to reduce memory. it SHOULD be possible though.
ALSO: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-script-fields.html
It’s important to understand the difference between
doc['my_field'].value and _source.my_field. The first, using the doc
keyword, will cause the terms for that field to be loaded to memory
(cached), which will result in faster execution, but more memory
consumption. Also, the doc[...] notation only allows for simple valued
fields (can’t return a json object from it) and make sense only on
non-analyzed or single term based fields.
So ALTERNATIVE: would be to use _source instead of doc which would not cache the lengths.
Gives:
{
"query" : {
"match_all" : {}
},
"facets" : {
"stat1" : {
"statistical" : {
"script" : "doc['title'].value.length()
//"script" : "_source.title.length() //ALTERNATIVE which isn't cached
}
}
}
}

Resources