Logstash "add_field" saves "%{...}" as value when key value pair missing in JSON - elasticsearch

add_field => {"ExampleFieldName" => "%{[example][jsonNested1][jsonNested2]}"}
My Logstash receives a JSON from Filebeat, which contains object example, which itself contains object jsonNested1, which contains a key value pair (with the key being jsonNested2).
If jsonNested1 exists and jsonNested2 exists and contains a value, then this value will be saved correctly in ExampleFieldName in Elasticsearch.
{
"example": {
"jsonNested1": {
"jsonNested2": "exampleValue"
}
}
}
In this case ExampleFieldName would contain exampleValue.
{
"example": {
"jsonNested1": {
}
}
}
In this case I would like ExampleFieldName to contain an empty string or no value at all (or to be not created in the first place).
But it happens that ExampleFiledName contains the string %{[example][jsonNested1][jsonNested2]}.
I already found a solution for this by checking first if the the nested key value pair exists before performing the add_field.
if [example][jsonNested1][jsonNested2] {
mutate {
add_field => {"ExampleFieldName" => "%{[example][jsonNested1][jsonNested2]}"}
}
}
This solution works, but I can't believe this is the best way to do it. I find it very strange that Logstash even saves %{[example][jsonNested1][jsonNested2]} as a string here, when the key value pair doesn't exist. I would expect it to recognize this and to simply not save any value in this case.
The if statement is an acceptable solution if have to check for one field. But currently I'm working on a Logstash config with around 50 fields. Should I create 50 if statements there?

You may be able to fix this using a prune filter, where the default value of blacklist_names is to remove unresolved field references.

Related

Kibana scripted field which loops through an array

I am trying to use the metricbeat http module to monitor F5 pools.
I make a request to the f5 api and bring back json, which is saved to kibana. But the json contains an array of pool members and I want to count the number which are up.
The advice seems to be that this can be done with a scripted field. However, I can't get the script to retrieve the array. eg
doc['http.f5pools.items.monitor'].value.length()
returns in the preview results with the same 'Additional Field' added for comparison:
[
{
"_id": "rT7wdGsBXQSGm_pQoH6Y",
"http": {
"f5pools": {
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
},
"pool.MemberCount": [
7
]
},
If I try
doc['http.f5pools.items']
Or similar I just get an error:
"reason": "No field found for [http.f5pools.items] in mapping with types []"
Googling suggests that the doc construct does not contain arrays?
Is it possible to make a scripted field which can access the set of values? ie is my code or the way I'm indexing the data wrong.
If not is there an alternative approach within metricbeats? I don't want to have to make a whole new api to do the calculation and add a separate field
-- update.
Weirdly it seems that the number values in the array do return the expected results. ie.
doc['http.f5pools.items.ratio']
returns
{
"_id": "BT6WdWsBXQSGm_pQBbCa",
"pool.MemberCount": [
1,
1
]
},
-- update 2
Ok, so if the strings in the field have different values then you get all the values. if they are the same you just get one. wtf?
I'm adding another answer instead of deleting my previous one which is not the actual question but still may be helpful for someone else in future.
I found a hint in the same documentation:
Doc values are a columnar field value store
Upon googling this further I found this Doc Value Intro which says that the doc values are essentially "uninverted index" useful for operations like sorting; my hypotheses is while sorting you essentially dont want same values repeated and hence the data structure they use removes those duplicates. That still did not answer as to why it works different for string than number. Numbers are preserved but strings are filters into unique.
This “uninverted” structure is often called a “column-store” in other
systems. Essentially, it stores all the values for a single field
together in a single column of data, which makes it very efficient for
operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is
enabled by default. Doc values are created at index-time: when a field
is indexed, Elasticsearch adds the tokens to the inverted index for
search. But it also extracts the terms and adds them to the columnar
doc values.
Some more deep-dive into doc values revealed it a compression technique which actually de-deuplicates the values for efficient and memory-friendly operations.
Here's a NOTE given on the link above which answers the question:
You may be thinking "Well that’s great for numbers, but what about
strings?" Strings are encoded similarly, with the help of an ordinal
table. The strings are de-duplicated and sorted into a table, assigned
an ID, and then those ID’s are used as numeric doc values. Which means
strings enjoy many of the same compression benefits that numerics do.
The ordinal table itself has some compression tricks, such as using
fixed, variable or prefix-encoded strings.
Also, if you dont want this behavior then you can disable doc-values
OK, solved it.
https://discuss.elastic.co/t/problem-looping-through-array-in-each-doc-with-painless/90648
So as I discovered arrays are prefiltered to only return distinct values (except in the case of ints apparently?)
The solution is to use params._source instead of doc[]
The answer for why doc doesnt work
Quoting below:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
Doc-values can only return "simple" field values like numbers, dates,
geo- points, terms, etc, or arrays of these values if the field is
multi-valued. It cannot return JSON objects
Also, important to add a null check as mentioned below:
Missing fields
The doc['field'] will throw an error if field is
missing from the mappings. In painless, a check can first be done with
doc.containsKey('field')* to guard accessing the doc map.
Unfortunately, there is no way to check for the existence of the field
in mappings in an expression script.
Also, here is why _source works
Quoting below:
The document _source, which is really just a special stored field, can
be accessed using the _source.field_name syntax. The _source is loaded
as a map-of-maps, so properties within object fields can be accessed
as, for example, _source.name.first.
.
Responding to your comment with an example:
The kyeword here is: It cannot return JSON objects. The field doc['http.f5pools.items'] is a JSON object
Try running below and see the mapping it creates:
PUT t5/doc/2
{
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
GET t5/_mapping
{
"t5" : {
"mappings" : {
"doc" : {
"properties" : {
"items" : {
"properties" : {
"monitor" : { <-- monitor is a property of items property(Object)
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}

Unique count, array to string

There is my input
{"Names":"Name1, Name2","Country":"TheCountry"}
What i have been trying to do is count how many time a certain name appears not only in one input but also using all previous events. For that i have looked into Metrics but i cannot figure out how i might be able to do that. The first problem i have meet is that Names is a string and not an array.
I do not see how i might convert Names into an array and give it to metric. Is there any other solution ?
First of all, please check logstash configuration and add the following split filter to your logstash.yml file. Your comma separated names will be split while ingesting the data:
filter {
split {
field => "Names"
terminator => ","
target => "NamesArray"
}
}
And you can change your mapping. To add a new field to your type mapping like below:
{
"properties": {
...
"NamesArray": {
"type": "keyword"
}
...
}
}
You should use keyword type for NamesArray to get correct metrics about the separated words with the blank character.

elastic stack : i need set Time Filter field name with another field

i need read messages(content is logs) from rabbitMq by logstash and then send that to elasticsearch for make visualize monitoring in kibana. so i wrote input for read from rabbitmq in logstash like this:
input {
rabbitmq {
queue => "testLogstash"
host => "localhost"
}
}
and i wrote output configuration for store in elasticsearch in logstash like this:
output {
elasticsearch{
hosts => "http://localhost:9200"
index => "d13-%{+YYYY.MM.dd}"
}
}
Both of them are placed in myConf.conf
In the content of each message, there is a Json that contains the fields like this:
{
"mDate":"MMMM dd YYYY, HH:mm:ss.SSS"
"name":"test name"
}
But there are two problems. First, there is no date field in the field of creating a new index(Time Filter field name). Second, I use the same timestamp as the default #timestamp, this field will not be displayed in the build type of graphs. I think the reason for this is because of the data type of the field. The field is of type date, but the string is considered.
i try to convert value of field to date by mutate in logstash config like this:
filter {
mutate {
convert => { "mdate" => "date" }
}
}
Now, two questions arise:
1- Is this the problem? If yes What is the right solution to fix it?
2- My main need is to use the time when messages are entered in the queue, not when Logstash takes them. What is the best solution?
If you don't specify a value for #timestamp, you should get the current system time when elasticsearch indexes the document. With that, you should be able to see items in kibana.
If I understand you correctly, you'd rather use you mDate field for #timestamp. For this, use the date{} filter in logstash.

Elasticsearch / Logstash define time or date when importing old log files

I have some old log files (one file per day).
log-2017.09.01.json
log-2017.09.02.json
etc
There is no date information in the json file.
By default, the timestamp of the index is the date of the creation of the index.
I am trying to create an index for each of these log file and I want the timestamp of the index corresponding to each log file to be the same as the one defined by the name of the file.
i.e., I want an index "log-2017.09.01" for which the timestamp would be 2017.09.01 and another index "log-2017.09.02" for which the timestamp would be 2017.09.02
Does anyone know how to simply do it ?
There isn't a simple here, but it can be done. It takes a few steps.
The first step, get the date out of the file-path.
filter {
grok {
match => { "path", "^log-%{DATA:date_partial}$" }
}
}
The second step is to pull your timestamp data out of the log-lines. I'm assuming you know how to do that.
The third step is to assemble a date field out of parts.
filter {
mutate {
add_field => { "full_timestamp", "%{date_partial} %{date_hour}:%{date_minute}" }
}
}
The last step is to use the date{} filter on that constructed field.
filter {
date {
match => [ "full_timestamp", "yyyy.MM.dd HH:mm" ]
}
}
This should give you an idea as to the technique needed.

Differentiating fields that didn't exist before schema change with Elastic Search

I'm trying to add a field to an elastic search schema. I already have about a million records in the index which don't have the field and I need to be able to differentiate those from the ones that are added after the field is. Using the modified date is the absolute last resort because I don't know when this will be turned on in production.
What I considered trying was the old records return something like
{
myField: null
}
and the new ones would return
{
myField: { }
}
But can't find a way to set the field on insert.

Resources