Preprocessing a message containing multiple log records - heroku

TL;DR. Is it possible to preprocess a message by splitting on the newlines, and then have each message go through the fluentd pipeline as usually?
I'm receiving these log messages in fluentd:
2018-09-13 13:00:41.251048191 +0000 : {"message":"146 <190>1 2018-09-13T13:00:40.685591+00:00 host app web.1 - 13:00:40.685 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Received GET /alerts\n"}
2018-09-13 13:00:41.337628343 +0000 : {"message":"199 <190>1 2018-09-13T13:00:40.872670+00:00 host app web.1 - 13:00:40.871 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Processing with Api.AlertController.index/2 Pipelines: [:api]\n156 <190>1 2018-09-13T13:00:40.898316+00:00 host app web.1 - 13:00:40.894 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Rendered \"index.json\" in 1.0ms\n155 <190>1 2018-09-13T13:00:40.898415+00:00 host app web.1 - 13:00:40.894 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Sent 200 response in 209.70ms\n"}
The problem with these logs is that second message: it contains multiple application log lines.
This is, unfortunately, what I have to deal with: the system (hello, Heroku logs!)I'm working with buffers logs and the spits them out as a single chunk, making it impossible to know the number of records in the chunk upfront.
This is known property of Heroku log draining.
Is there a way to preprocess the log message, so that I get a flat stream of messages to be processed normally by subsequent fluentd facilities?
This is how the post-processed stream of messages should look like:
2018-09-13 13:00:41.251048191 +0000 : {"message":"146 <190>1 2018-09-13T13:00:40.685591+00:00 host app web.1 - 13:00:40.685 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Received GET /alerts\n"}
2018-09-13 13:00:41.337628343 +0000 : {"message":"199 <190>1 2018-09-13T13:00:40.872670+00:00 host app web.1 - 13:00:40.871 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Processing with Api.AlertController.index/2 Pipelines: [:api]\n"}
2018-09-13 13:00:41.337628343 +0000 : {"message":"156 <190>1 2018-09-13T13:00:40.898316+00:00 host app web.1 - 13:00:40.894 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Rendered \"index.json\" in 1.0ms\n"}
2018-09-13 13:00:41.337628343 +0000 : {"message":"155 <190>1 2018-09-13T13:00:40.898415+00:00 host app web.1 - 13:00:40.894 request_id=40932fe8-cd7e-42e9-af24-13350159376d [info] Sent 200 response in 209.70ms\n"}
P.S. My current config is super basic, but I'm posting it just in case. All I'm trying to do is to understand if it's possible, in principle, preprocess the message?
<source>
#type http
port 5140
bind 0.0.0.0
<parse>
#type none
</parse>
</source>
<filter **>
#type stdout
</filter>

How about https://github.com/hakobera/fluent-plugin-heroku-syslog ?
fluent-plugin-heroku-syslog has been unmaintained since 4 years ago, but it will work with Fluentd v1 using compatible layer.

Related

SonarQube - Failed to get CE Task status - HTTP code 502

I am trying to run sonarqube (hosted remotely) via SonarScanner command through local machine for magento application (for PHP), but getting below error every time. I tried to find solution for this, but didnt find much related to my issue
anyone has any idea about this error?
13:22:38.820 INFO: ------------- Check Quality Gate status
13:22:38.820 INFO: Waiting for the analysis report to be processed (max 600s)
13:22:38.827 DEBUG: GET 200 https://example.com/api/ce/task?id=AYE-URS3o9NiO9ce0vrw | time=7ms
13:22:43.845 DEBUG: GET 200 https://example.com/api/ce/task?id=AYE-URS3o9NiO9ce0vrw | time=11ms
13:22:48.854 DEBUG: GET 200 https://example.com/api/ce/task?id=AYE-URS3o9NiO9ce0vrw | time=9ms
13:22:53.866 DEBUG: GET 200 https://example.com/api/ce/task?id=AYE-URS3o9NiO9ce0vrw | time=12ms
13:22:58.871 DEBUG: GET 502 https://example.com/api/ce/task?id=AYE-URS3o9NiO9ce0vrw | time=5ms
13:22:58.899 DEBUG: eslint-bridge server will shutdown
13:23:04.549 DEBUG: stylelint-bridge server will shutdown
13:23:09.571 INFO: ------------------------------------------------------------------------
13:23:09.571 INFO: EXECUTION FAILURE
13:23:09.571 INFO: ------------------------------------------------------------------------
13:23:09.571 INFO: Total time: 12:09.000s
13:23:09.688 INFO: Final Memory: 14M/50M
13:23:09.688 INFO: ------------------------------------------------------------------------
13:23:09.689 ERROR: Error during SonarScanner execution
Failed to get CE Task status - HTTP code 502: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
</body>
</html>

Fluentd forwarder DaemonSet has wrong logs format

I use bitnami fluentd chart for Kubernetes and my setup is almost native besides of some changes.
My source section looks like
#type tail
path /var/log/containers/*my-app*.log
pos_file /opt/bitnami/fluentd/logs/buffers/fluentd-docker.pos
tag kubernetes.*
read_from_head true
and my application sends to stdout some more advanced logs information like:
2021-07-13 11:33:49.060 +0000 - [ERROR] - fatal error - play.api.http.DefaultHttpErrorHandler in postman-akka.actor.default-dispatcher-6 play.api.UnexpectedException: Unexpected exception[RuntimeException: java.net.ConnectException: Connection refused (Connection refused)]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:328)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler
and the problem is because in fluentd forwarder I can see (in /var/log/containers/*) that all records are stored in the following format:
{"log":"2021-07-13 19:54:48.523 +0000 - [ERROR] - from akka.io.TcpListener in postman-akka.actor.default-dispatcher-6 New connection accepted \n","stream":"stdout","time":"2021-07-13T19:54:48.523724149Z"}
{"log":"2021-07-13 19:54:48.523 +0000 - [ERROR] -- play.api.http.DefaultHttpErrorHandler in postman-akka.actor.default-dispatcher-6 \n","stream":"stdout","time":"2021-07-13T19:55:10.479279395Z"}
{"log":"2021-07-13 19:54:48.523 +0000 - [ERROR] - play.api.UnexpectedException: Unexpected exception[RuntimeException: }
{"log":"2021-07-13 19:54:48.523 +0000 - [ERROR] - java.net.ConnectException: Connection refused (Connection refused)] }
and the problem as you can see here is that all those lines are "separated" log record.
I would like to extract entire log message with entire stack trace, I wrote some configuration to fluentd parse section
#type regexp
expression /^(?<time>^(.*?:.*?)):\d\d.\d+\s\+0000 - (?<type>(\[\w+\])).- (?<text>(.*))/m
time_key time
time_format %Y-%m-%d %H:%M:%S
</parse>
but I am pretty sure that this is not problem because from some reason those files in (/var/log/containers/*.log) already storing wrong format of records, how can I configure fluentd forwarder to "take" logs from containers and store logs in format (non-json) ?

Fluentd is not filtering as intended before writing to Elasticsearch

Using:
Elasticsearch 7.5.1.
Fluentd 1.11.2
Fluent-plugin-elasticsearch 4.1.3
Springboot 2.3.3
I have a Springboot artifact with Logback configured with an appender that, in addition to the app STDOUT, sends logs to Fluentd:
<appender name="FLUENT_TEXT"
class="ch.qos.logback.more.appenders.DataFluentAppender">
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<tag>myapp</tag>
<label>myservicename</label>
<remoteHost>fluentdservicename</remoteHost>
<port>24224</port>
<useEventTime>false</useEventTime>
</appender>
Fluentd config file looks like this:
<ROOT>
<source>
#type forward
port 24224
bind "0.0.0.0"
</source>
<filter myapp.**>
#type parser
key_name "message"
reserve_data true
remove_key_name_field false
<parse>
#type "json"
</parse>
</filter>
<match myapp.**>
#type copy
<store>
#type "elasticsearch"
host "elasticdb"
port 9200
logstash_format true
logstash_prefix "applogs"
logstash_dateformat "%Y%m%d"
include_tag_key true
type_name "app_log"
tag_key "#log_name"
flush_interval 1s
user "elastic"
password xxxxxx
<buffer>
flush_interval 1s
</buffer>
</store>
<store>
#type "stdout"
</store>
</match>
</ROOT>
So it just adds a filter to parse the information (a Json string) to a structured way and then writes it to Elasticsearch (as well as to Fluentd's STDOUT). Check how I add the myapp.** regexp to make it match in the filter and in the match blocks.
Everyting is up and running properly in Openshift. Springboot sends properly the logs to Fluentd, and Fluentd writes in Elasticsearch.
But the problem is that every log generated from the app is also written. This means that every INFO log with, for example, the initial Spring configuration or any other information that the app sends to through Logback is also written.
Example of "wanted" log:
2020-11-04 06:33:42.312840352 +0000 myapp.myservice: {"traceId":"bf8195d9-16dd-4e58-a0aa-413d89a1eca9","spanId":"f597f7ffbe722fa7","spanExportable":"false","X-Span-Export":"false","level":"INFO","X-B3-SpanId":"f597f7ffbe722fa7","idOrq":"bf8195d9-16dd-4e58-a0aa-413d89a1eca9","logger":"es.organization.project.myapp.commons.services.impl.LoggerServiceImpl","X-B3-TraceId":"f597f7ffbe722fa7","thread":"http-nio-8085-exec-1","message":"{\"traceId\":\"bf8195d9-16dd-4e58-a0aa-413d89a1eca9\",\"inout\":\"IN\",\"startTime\":1604471622281,\"finishTime\":null,\"executionTime\":null,\"entrySize\":5494.0,\"exitSize\":null,\"differenceSize\":null,\"user\":\"pmmartin\",\"methodPath\":\"Method Path\",\"errorMessage\":null,\"className\":\"CamelOrchestrator\",\"methodName\":\"preauthorization_validate\"}","idOp":"","inout":"IN","startTime":1604471622281,"finishTime":null,"executionTime":null,"entrySize":5494.0,"exitSize":null,"differenceSize":null,"user":"pmmartin","methodPath":"Method Path","errorMessage":null,"className":"CamelOrchestrator","methodName":"preauthorization_validate"}
Example of "unwanted" logs (check how there is a Fluentd warning per each unexpected log message):
2020-11-04 06:55:09.000000000 +0000 myapp.myservice: {"level":"INFO","logger":"org.apache.camel.impl.engine.InternalRouteStartupManager","thread":"restartedMain","message":"Route: route6 started and consuming from: servlet:/preAuth"}
2020-11-04 06:55:09 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'Total 20 routes, of which 20 are started'" location=nil tag="myapp.myservice" time=1604472909 record={"level"=>"INFO", "logger"=>"org.apache.camel.impl.engine.AbstractCamelContext", "thread"=>"restartedMain", "message"=>"Total 20 routes, of which 20 are started"}
2020-11-04 06:55:09.000000000 +0000 myapp.myservice: {"level":"INFO","logger":"org.apache.camel.impl.engine.AbstractCamelContext","thread":"restartedMain","message":"Total 20 routes, of which 20 are started"}
2020-11-04 06:55:09 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'Apache Camel 3.5.0 (MyService DEMO Mode) started in 0.036 seconds'" location=nil tag="myapp.myservice" time=1604472909 record={"level"=>"INFO", "logger"=>"org.apache.camel.impl.engine.AbstractCamelContext", "thread"=>"restartedMain", "message"=>"Apache Camel 3.5.0 (MyService DEMO Mode) started in 0.036 seconds"}
2020-11-04 06:55:09.000000000 +0000 myapp.myservice: {"level":"INFO","logger":"org.apache.camel.impl.engine.AbstractCamelContext","thread":"restartedMain","message":"Apache Camel 3.5.0 (MyService DEMO Mode) started in 0.036 seconds"}
2020-11-04 06:55:09 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'Started MyServiceApplication in 15.446 seconds (JVM running for 346.061)'" location=nil tag="myapp.myservice" time=1604472909 record={"level"=>"INFO", "logger"=>"es.organization.project.myapp.MyService", "thread"=>"restartedMain", "message"=>"Started MyService in 15.446 seconds (JVM running for 346.061)"}
The question is: What and how do I tell Fluentd to really filter the info that gets to it so the unwanted info gets discarded?
Thanks to #Azeem, and according to grep and regexp features documentation, I got it :).
I just added this to my Fluentd config file:
<filter onpay.**>
#type grep
<regexp>
key message
pattern /^.*inout.*$/
</regexp>
</filter>
Any line that does not contain the word "inout" is now excluded.

Filebeat connect Logstash always i/o timeout

Filebeat works well before I change the password of elasticsearch. By the way, I use docker-compose to start the service, here is some information about my filebeat.
Console log:
filebeat | 2017/05/11 05:21:33.020851 beat.go:285: INFO Home path: [/] Config path: [/] Data path: [//data] Logs path: [//logs]
filebeat | 2017/05/11 05:21:33.020903 beat.go:186: INFO Setup Beat:
filebeat; Version: 5.3.0
filebeat | 2017/05/11 05:21:33.021019 logstash.go:90: INFO Max Retries set to: 3
filebeat | 2017/05/11 05:21:33.021097 outputs.go:108: INFO Activated
logstash as output plugin.
filebeat | 2017/05/11 05:21:33.021908 publish.go:295: INFO Publisher name: fd2f326e51d9
filebeat | 2017/05/11 05:21:33.022092 async.go:63: INFO Flush Interval set to: 1s
filebeat | 2017/05/11 05:21:33.022104 async.go:64: INFO Max Bulk Size set to: 2048
filebeat | 2017/05/11 05:21:33.022220 modules.go:93: ERR Not loading modules. Module directory not found: /module
filebeat | 2017/05/11 05:21:33.022291 beat.go:221: INFO filebeat start running.
filebeat | 2017/05/11 05:21:33.022334 registrar.go:68: INFO No registry file found under: /data/registry. Creating a new registry file.
filebeat | 2017/05/11 05:21:33.022570 metrics.go:23: INFO Metrics logging every 30s
filebeat | 2017/05/11 05:21:33.025878 registrar.go:106: INFO Loading registrar data from /data/registry
filebeat | 2017/05/11 05:21:33.025918 registrar.go:123: INFO States Loaded from registrar: 0
filebeat | 2017/05/11 05:21:33.025970 crawler.go:38: INFO Loading Prospectors: 1
filebeat | 2017/05/11 05:21:33.026119 prospector_log.go:61: INFO Prospector with previous states loaded: 0
filebeat | 2017/05/11 05:21:33.026278 prospector.go:124: INFO Starting prospector of type: log; id: 5816422928785612348
filebeat | 2017/05/11 05:21:33.026299 crawler.go:58: INFO Loading and starting Prospectors completed. Enabled prospectors: 1
filebeat | 2017/05/11 05:21:33.026323 registrar.go:236: INFO Starting Registrar
filebeat | 2017/05/11 05:21:33.026364 sync.go:41: INFO Start sending events to output
filebeat | 2017/05/11 05:21:33.026394 spooler.go:63: INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
filebeat | 2017/05/11 05:21:33.026731 log.go:91: INFO Harvester started for file: /data/logs/biz.log
filebeat | 2017/05/11 05:22:03.023313 metrics.go:39: INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=1
filebeat.harvester.running=1 filebeat.harvester.started=1 libbeat.publisher.published_events=98 registrar.writes=1
filebeat | 2017/05/11 05:22:08.028292 single.go:140: ERR Connecting error publishing events (retrying): dial tcp 47.93.121.126:5044: i/o timeout
filebeat | 2017/05/11 05:22:33.023370 metrics.go:34: INFO No non-zero metrics in the last 30s
filebeat | 2017/05/11 05:22:39.028840 single.go:140: ERR Connecting error publishing events (retrying): dial tcp 47.93.121.126:5044: i/o timeout
filebeat | 2017/05/11 05:23:03.022906 metrics.go:34: INFO No non-zero metrics in the last 30s
filebeat | 2017/05/11 05:23:11.029517 single.go:140: ERR Connecting error publishing events (retrying): dial tcp 47.93.121.126:5044: i/o timeout
filebeat | 2017/05/11 05:23:33.023450 metrics.go:34: INFO No non-zero metrics in the last 30s
filebeat | 2017/05/11 05:23:45.030202 single.go:140: ERR Connecting error publishing events (retrying): dial tcp 47.93.121.126:5044: i/o timeout
filebeat | 2017/05/11 05:24:03.022864 metrics.go:34: INFO No non-zero metrics in the last 30s
filebeat | 2017/05/11 05:24:23.030749 single.go:140: ERR Connecting error publishing events (retrying): dial tcp 47.93.121.126:5044: i/o timeout
filebeat | 2017/05/11 05:24:33.024029 metrics.go:34: INFO No non-zero metrics in the last 30s
filebeat | 2017/05/11 05:25:03.023338 metrics.go:34: INFO No non-zero metrics in the last 30s
filebeat | 2017/05/11 05:25:09.031348 single.go:140: ERR Connecting error publishing events (retrying): dial tcp 47.93.121.126:5044: i/o timeout
filebeat | 2017/05/11 05:25:33.023976 metrics.go:34: INFO No non-zero metrics in the last 30s
filebeat | 2017/05/11 05:26:03.022900 metrics.go:34: INFO No non-zero metrics in the last 30s
filebeat | 2017/05/11 05:26:11.032346 single.go:140: ERR Connecting error publishing events (retrying): dial tcp 47.93.121.126:5044: i/o timeout
filebeat | 2017/05/11 05:26:33.022870 metrics.go:34: INFO No non-zero metrics in the last 30s
filebeat.yml:
filebeat:
prospectors:
-
paths:
- /data/logs/*.log
input_type: log
document_type: biz-log
registry_file: /etc/registry/mark
output:
logstash:
enabled: true
hosts: ["logstash:5044"]
docker-compose.yml:
version: '2'
services:
filebeat:
build: ./
container_name: filebeat
restart: always
network_mode: "bridge"
extra_hosts:
- "logstash:47.93.121.126"
volumes:
- ./conf/filebeat.yml:/filebeat.yml
- /mnt/logs/appserver/app/biz:/data/logs
- ./registry:/data
Having had a similar issue, I eventually realised the culprit was not Filebeat but Logstash.
Logstash's SSL configuration didn't contain all required attributes. Setting it up using the following declaration solved the issue:
input {
beats {
port => "{{ logstash_port }}"
ssl => true
ssl_certificate_authorities => [ "{{ tls_certificate_authority_file }}" ]
ssl_certificate => "{{ tls_certificate_file }}"
ssl_key => "{{ tls_certificate_key_file }}"
ssl_verify_mode => "force_peer"
}
}
The above example works with Ansible, remember to replace placeholders between {{ and }} by the correct values.
The registry file stores the state and location information that Filebeat uses to track where it was last reading.
So you can try updating or deleting registry file
cd /var/lib/filebeat
sudo mv registry registry.bak
sudo service filebeat restart

SpringXD -> twitterstream --follow: ending with Http error

I'm trying to crate a stream that should follow #BBCBreaking (what should have 5402612 twitter ID), but I keep getting following Http error:
2016-03-28T02:13:12+0200 1.3.1.RELEASE INFO DeploymentSupervisor-0 zk.ZKStreamDeploymentHandler - Deployment status for stream 'mystream': DeploymentStatus{state=deployed}
2016-03-28T02:13:13+0200 1.3.1.RELEASE WARN twitterSource-1-1 twitter.TwitterStreamChannelAdapter - Http error, waiting for 5 seconds before restarting
2016-03-28T02:13:19+0200 1.3.1.RELEASE WARN twitterSource-1-1 twitter.TwitterStreamChannelAdapter - Http error, waiting for 10 seconds before restarting
2016-03-28T02:13:30+0200 1.3.1.RELEASE WARN twitterSource-1-1 twitter.TwitterStreamChannelAdapter - Http error, waiting for 20 seconds before restarting
my stream command is:
stream create --name mystream --definition "twitterstream --follow='5402612' | log" --deploy
running on SpringXD: 1.3.1.RELEASE
please, any idea that why the error?
You can debug such situations by enabling DEBUG logging - log config is in the xd/config folder in .groovy files; e.g. xd-singlenode-logback.groovy.
Set the loggers for org.springframework.integration and org.springframework.xd, org.springframework.xd.dirt.server to DEBUG and add a logger for org.springframework.social.twitter also at DEBUG.
Or you can set all of org.springframework and comment out the more specific ones.

Resources