Indexing Errors when using quarkus logging-gelf extension and ELK stack - elasticsearch

I have setup logging like described in https://quarkus.io/guides/centralized-log-management with an ELK Stack using version 7.7.
My logstash pipeline looks like the proposed example:
input {
gelf {
port => 12201
}
}
output {
stdout {}
elasticsearch {
hosts => ["http://elasticsearch:9200"]
}
}
Most Messages are showing up in my Kibana using logstash.* as an Index pattern. But some Messages are dropped.
2020-05-28 15:30:36,565 INFO [io.quarkus] (Quarkus Main Thread) Quarkus 1.4.2.Final started in 38.335s. Listening on: http://0.0.0.0:8085
The Problem seems to be, that the fields MessageParam0, MessageParam1, MessageParam2 etc. are mapped to the type that first appeared in the logs but actually contain multiple datatypes. The Elasticsearch log shows Errors like ["org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [MessageParam1].
Is there any way in the Quarkus logging-gelf extension to correctly map the values?

ELK can auto-create your Elasticsearch index mapping by looking at the first indexed document. This is a very convenient functionality, but it comes with some drawback.
For example, if you have a field that can contains numbers or strings, if the first document contains a number for this field, the mapping will be created with a number field so you will not be able to index a document containing a String inside this field ...
The only workaround for this is to create the mapping upfront (you can only defines the fields that causing the issue, the other fields will be created automatically).
This is an ELK issue, there is nothing we can do at Quarkus side.

Related

Cannot open data from 5.x in 6.x : Mappings

I've created a routine that updates ES clients from 5.x to 6.x and finally 7.x
Somehow some clients cannot be updated.
Loading existing data in 6.8 fails.
Appearently some mappings are causing this.
But there are not templates applied and I cannot see any difference to the other clients, were everything works just fine.
I know that ES has dropped string type and is using text now but where does this type string come from? Why doesn't it occur on the other clients then? And finally - how would I solve this? I cannot change type from string to text in 5.x and I cannot apply templates in 6.x because it's not starting up.
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Failed to parse mapping [datapoint]: No handler for type [string] declared on field [batchId]
UPDATE:
this is my current mapping for batchId
http://localhost:9200/_mapping
"batchId":{"type":"keyword"}
it seems that you forgot to change the datatype from string to text in your mapping which caused MapperParsingException and its really good that exception is telling you that probalmatic field is batchId, just change it to text datatype and it should work.
Please refer this elastic blog that talks about this string to text change and provides some tips on how to handle it while upgrading.
The problem was something else:
Some clients had unexpected indices which caused the problem.
After deleting them, ES 6.x started fine.

Elasticsearch Dynamic Field Mapping and JSON Dot Notation

I'm trying to write logs to an Elasticsearch index from a Kubernetes cluster. Fluent-bit is being used to read stdout and it enriches the logs with metadata including pod labels. A simplified example log object is
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
The problem is that a few other applications deployed to the cluster have labels of the following format:
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
These applications are installed via Helm charts and the newer ones are following the label and selector conventions as laid out here. The naming convention for labels and selectors was updated in Dec 2018, seen here, and not all charts have been updated to reflect this.
The end result of this is that depending on which type of label format makes it into an Elastic index first, trying to send the other type in will throw a mapping exception. If I create a new empty index and send in the namespaced label first, attempting to log the simple app label will throw this exception:
object mapping for [kubernetes.labels.app] tried to parse field [kubernetes.labels.app] as object, but found a concrete value
The opposite situation, posting the namespaced label second, results in this exception:
Could not dynamically add mapping for field [kubernetes.labels.app.kubernetes.io/name]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text].
What I suspect is happening is that Elasticsearch sees the periods in the field name as JSON dot notation and is trying to flesh it out as an object. I was able to find this PR from 2015 which explicitly disallows periods in field names however it seems to have been reversed in 2016 with this PR. There is also this multi-year thread from 2015-2017 discussing this issue but I was unable to find anything recent involving the latest versions.
My current thoughts on moving forward is to standardize the Helm charts we are using to have all of the labels use the same convention. This seems like a band-aid on the underlying issue though which is that I feel like I'm missing something obvious in the configuration of Elasticsearch and dynamic field mappings.
Any help here would be appreciated.
I opted to use the Logstash mutate filter with the rename option as described here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-rename
The end result looked something like this:
filter {
mutate {
'[kubernetes][labels][app]' => '[kubernetes][labels][app.kubernetes.io/name]'
'[kubernetes][labels][chart]' => '[kubernetes][labels][helm.sh/chart]'
}
}
Although personally I've never encountered the exact same issue, I had similar problems when I indexed some test data and afterwards changed the structure of the document that should have been indexed (especially when "unflattening" data structures).
Your interpretation of the error message is correct. When you first index the document
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
Elasticsearch will recognize the app as an object/structure due to dynamic mapping.
When you then try to index the document
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
the previously, dynamically created mapping defined the field app as an object with sub-fields but elasticsearch encounters a concrete value, namely "application-1".
I suggest that you setup an index template to define the correct mappings. For the 'outdated' logging-versions I suggest to pre-process the particular documents either through an elasticsearch ingest-pipeline or with e.g. Logstash to get the documents in the correct format.
Hope that helps.

One elasticsearch/logstash instance with multiple applications with a unique index

We use swisscoms application cloud and are currently evaluating the new Elasticsearch service. We set it up including logstash and kibana.
We now added a user provided service to each of our apps that should use the common elasticsearch/logstash/kibana instance. When we first logged in into kibana we saw there was an index called logstash-, where all the logs of all applications go.
Now what we want is to have a index for each of the apps that writes to our elk instance. Lets say we have e apps (app1, app2, app3). We d like to have three indices (app1-..., app2-... and app3-...). Any ideas on how we can achieve that?
Is that a configuration that has to be done using ENV variables on Cloud foundry or is it something we have to configure within our Java and NodeJS apps
(app1-... , ...)?
Thanks in advance for your help.
You can use Elasticsearch output plugin for logstash which is the recommended method of storing logs in Elasticsearch. This plugin has a configuration option called index which is used to define the name of the index to write events to. The default index name is logstash-%{+YYYY.MM.dd}
Use it along with if conditional to assign a name of the index for each app based on type, like this,
output {
if [type] == "apache" {
elasticsearch {
index => "apache-website-index"
}
} elseif [type] == "nginx" {
elasticsearch {
index => "nginx-website-index"
}
}
}
Please have a look at this answer as well
Please comment if you have any question.

logstash not mapping to values in indices

I have a sample log
2016-12-28 16:40:53.290 [debug] <0.545.0> <<"{\"user_id\”:\”79\”,\”timestamp\":\"2016-12-28T11:10:26Z\",\"operation\":\"ver3 - Requested for recommended,verified handle information\",\"data\":\"\",\"content_id\":\"\",\"channel_id\":\"\"}">>
for which I have written a logstash grok filter
filter{
grok {
match => { "message" => "%{URIHOST} %{TIME} %{SYSLOG5424SD} <%{BASE16FLOAT}.0> <<%{QS}>>"}
}
}
in http://grokdebug.herokuapp.com/ everything is working fine and values are getting mapped with filter.
When I am pushing values with this filter into elastic search it's not getting mapped and in message only I am getting whole log as it is.
Please let me know if I am doing something wrong.
Your kibana screen shot isn't loading, but I'll take a guess: you're capturing patterns, but not naming the data into fields. Here's the difference:
%{TIME}
will look for that pattern in your data. The debugger will show "TIME" as having been parsed, but logstash won't create a field without being asked.
%{TIME:myTime}
will create the field (and you can see it working in the debugger).
You would need to do this for any matched pattern that you would like to save.

How to remove the json. prefix on my elasticsearch field

I use ELK to get some info on my rabbitmq stuff.
Here my conf logstash side
json {
source => "message"
}
But in kibana I have to prefix all my fields with json.xxx:
json.sender, json.sender.raw,json.programld, json.programId.raw ...
How can I not have this json.-prefix in my field names, so that I only have to have: sender, programId, etc.?
Best regards and thanks for your help !
Bonus question : what are all these .'raw' I must use in kibana ?
According to the doc:
By default it will place the parsed JSON in the root (top level) of
the Logstash event, but this filter can be configured to place the
JSON into any arbitrary event field, using the target configuration.
So it feels like your json is wrapped in a container named "json" or you're setting the "target" in logstash without showing us.
As for ".raw", the default elasticsearch mapping will analyze the data you put in a field, so changing "/var/log/messages" into three words: [var, log, messages]" which can make it hard to search. To keep you from having to worry about this at the beginning, logstash creates a ".raw" version of each string, which is not analyzed.
You'll eventually make your own mappings, and you can make the original field not_analyzed, so you won't need the .raw versions anymore.

Resources