We have ElasticSearch FluentD Kibana Stack in our K8s, We are using different source for taking logs and matching it to different Elasticsearch host to get our logs bifurcated .
All was working fine until one of our elastic (elastic-audit) is down and now none of logs are getting pushed which has been mentioned on the fluentd config.
Not sure how can 1 elastic cluster can affect other cluster which are configured on same fluentd configuration file? We tried to remove the impacted tag from the configuration which is in our case "AUDIT_LOG" from filter and match , after which the logs started pushing from other tags.
Need to get FluentD working with all tags in it
NAMESPACE NAME HEALTH NODES VERSION PHASE AGE
dm-elastic dm-eck green 5 7.2.0 Ready 252d
elastic-audit da-audit red 7.4.0 Ready 263d
monitoring da-eck green 3 7.2.0 Ready 264d
moss-elastic moss-eck green 7 7.2.0 Ready 252d
Error Logs
2021-06-30 08:55:41 +0000 [warn]: [elasticsearch-auditlog] Remaining retry: 5. Retry to communicate after 1024 second(s).
2021-06-30 09:29:49 +0000 [warn]: [elasticsearch-auditlog] Could not communicate to Elasticsearch, resetting connection and trying again. SSL_connect SYSCALL returned=5 errno=0 state=unknown state (OpenSSL::SSL::SSLError)
2021-06-30 09:29:49 +0000 [warn]: [elasticsearch-auditlog] Remaining retry: 4. Retry to communicate after 2048 second(s).
FLUENTD Config
containers.input.conf: |-
<source>
#id fluentd-containers.log
#type tail
path /var/log/containers/*.log
exclude_path /var/log/containers/fluentd*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag kubernetes.*
format json
read_from_head true
</source>
system.input.conf: |-
# Example:
# 2015-12-21 23:17:22,066 [salt.state ][INFO ] Completed state [net.ipv4.ip_forward] at time 23:17:22.066081
<source>
#id startupscript.log
#type tail
format syslog
path /var/log/startupscript.log
pos_file /var/log/es-startupscript.log.pos
tag startupscript
</source>
<source>
#id kubelet.log
#type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path /var/log/kubelet.log
pos_file /var/log/es-kubelet.log.pos
tag kubelet
</source>
# Logs from systemd-journal for interesting services.
# TODO(random-liu): Remove this after cri container runtime rolls out.
<source>
#id journald-docker
#type systemd
matches [{ "_SYSTEMD_UNIT": "docker.service" }]
<storage>
#type local
persistent true
</storage>
read_from_head true
tag docker
</source>
<source>
#id haproxy-ingress
#type syslog
port 5140
bind 0.0.0.0
tag haproxy
<parse>
#type syslog
message_format rfc5424
rfc5424_time_format %Y-%m-%dT%H:%M:%S%z
</parse>
</source>
<source>
#id journald-kubelet
#type systemd
matches [{ "_SYSTEMD_UNIT": "kubelet.service" }]
<storage>
#type local
persistent true
</storage>
read_from_head true
tag kubelet
</source>
forward.input.conf: |-
# Takes the messages sent over TCP
<source>
#type forward
</source>
monitoring.conf: |-
# Prometheus Exporter Plugin
# input plugin that exports metrics
<source>
#type prometheus
</source>
<source>
#type monitor_agent
</source>
# input plugin that collects metrics from MonitorAgent
<source>
#type prometheus_monitor
<labels>
host ${hostname}
</labels>
</source>
# input plugin that collects metrics for output plugin
<source>
#type prometheus_output_monitor
<labels>
host ${hostname}
</labels>
</source>
# input plugin that collects metrics for in_tail plugin
<source>
#type prometheus_tail_monitor
<labels>
host ${hostname}
</labels>
</source>
output.conf: |-
<filter kubernetes.**>
#type concat
key log
separator ""
stream_identity_key tag
multiline_start_regexp /^time=/
flush_interval 5
timeout_label #NORMAL
</filter>
<match **>
#type relabel
#label #NORMAL
</match>
<label #NORMAL>
<match kubernetes.**>
#type rewrite_tag_filter
<rule>
key log
pattern (tag=(AUDIT_LOG|CUSTOMER_AUDIT_LOG)|"log_type":"AUDIT_LOG")
tag auditlog.${tag}
</rule>
<rule>
key log
pattern tag=PARTNER_AUDIT_LOG
tag partner-auditlog.${tag}
</rule>
<rule>
key log
pattern tag=MAINTENANCE_AUDIT_LOG
tag maintenance-auditlog.${tag}
</rule>
<rule>
key log
pattern ^time=".*?".*
tag daas_service.${tag}
</rule>
<rule>
key log
pattern ^time=".*?".*
tag other_service.${tag}
invert true
</rule>
</match>
# Enriches records with Kubernetes metadata
<filter {daas_service,other_service}.kubernetes.**>
#type kubernetes_metadata
</filter>
<filter {daas_service,other_service}.kubernetes.**>
#type throttle
group_key kubernetes.pod_name
group_bucket_period_s 60
group_bucket_limit 6000
</filter>
<filter daas_service.kubernetes.**>
#type kvp
parse_key log
fields_key log_field
pattern "([a-zA-Z_-]\\w*)=((['\"])(?:^(?:\\3)|[^\\\\]|\\\\.)*?(\\3)|[\\w.#$%/+-]*)"
</filter>
<filter daas_service.kubernetes.**>
#type record_modifier
<record>
dummy ${if record.has_key?('log_field') and record['log_field'].has_key?('time'); record['#timestamp']=record['log_field']['time']; record['log_field'].delete('time'); end; nil}
dummy2 ${begin; t = Time.parse record['#timestamp']; record['#timestamp'] = t.utc.strftime('%Y-%m-%dT%H:%M:%S.%3NZ'); rescue; record.delete('#timestamp'); end; nil}
</record>
remove_keys dummy,dummy2
</filter>
<filter auditlog.kubernetes.**>
#type kvp
parse_key log
pattern "([a-zA-Z_-]\\w*)=((['\"])(?:^(?:\\3)|[^\\\\]|\\\\.)*?(\\3)|[\\w.#$%/+-]*)"
</filter>
<filter auditlog.kubernetes.**>
#type record_modifier
<record>
dummy ${if record.has_key?('time'); record['#timestamp']=record['time']; record.delete('time'); end; nil}
dummy2 ${begin; t = Time.parse record['#timestamp']; record['#timestamp'] = t.utc.strftime('%Y-%m-%dT%H:%M:%S.%3NZ'); rescue; record.delete('#timestamp'); end; nil}
levelinfo ${if record.has_key?('level'); record['level']='info'; end; nil}
</record>
remove_keys dummy,dummy2,levelinfo
</filter>
<filter partner-auditlog.kubernetes.**>
#type kvp
parse_key log
pattern "([a-zA-Z_-]\\w*)=((['\"])(?:^(?:\\3)|[^\\\\]|\\\\.)*?(\\3)|[\\w.#$%/+-]*)"
</filter>
<filter partner-auditlog.kubernetes.**>
#type record_modifier
<record>
dummy ${if record.has_key?('time'); record['#timestamp']=record['time']; record.delete('time'); end; nil}
dummy2 ${begin; t = Time.parse record['#timestamp']; record['#timestamp'] = t.utc.strftime('%Y-%m-%dT%H:%M:%S.%3NZ'); rescue; record.delete('#timestamp'); end; nil}
levelinfo ${if record.has_key?('level'); record['level']='info'; end; nil}
</record>
remove_keys dummy,dummy2,levelinfo
</filter>
<filter maintenance-auditlog.kubernetes.**>
#type kvp
parse_key log
pattern "([a-zA-Z_-]\\w*)=((['\"])(?:^(?:\\3)|[^\\\\]|\\\\.)*?(\\3)|[\\w.#$%/+-]*)"
</filter>
<filter maintenance-auditlog.kubernetes.**>
#type record_modifier
<record>
dummy ${if record.has_key?('time'); record['#timestamp']=record['time']; record.delete('time'); end; nil}
dummy2 ${begin; t = Time.parse record['#timestamp']; record['#timestamp'] = t.utc.strftime('%Y-%m-%dT%H:%M:%S.%3NZ'); rescue; record.delete('#timestamp'); end; nil}
levelinfo ${if record.has_key?('level'); record['level']='info'; end; nil}
</record>
remove_keys dummy,dummy2,levelinfo
</filter>
<filter haproxy.**>
#type parser
key_name message
reserve_data true
reserve_time true
emit_invalid_record_to_error false
<parse>
#type multi_format
<pattern>
format regexp
# Examples
# 10.2.1.0:31654 [06/Nov/2019:13:21:05.569] httpsfront default-paas-secure-443/10.20.48.136:443 1/0/642 3670 SD 3/2/0/0/0 0/0
expression /^(?<remoteAddress>[\w\.]+:\d+) \[(?<requestDate>[^\]]*)\] httpsfront (?<namespace>[\w]+)-(?<service>[\w-]+)\/(?<backendAddress>[\w\.]+:\d+) (?<waitTime>\d+)\/(?<backendConnectTime>\d+)\/(?<responseTime>\d+) (?<responseBytes>\d+) (?<terminationState>[\w-]+) (?<actconn>\d+)\/(?<feconn>\d+)\/(?<beconn>\d+)\/(?<srvconn>\d+)\/(?<retries>\d+) (?<srvqueue>\d+)\/(?<backendQueue>\d+)$/
</pattern>
<pattern>
format regexp
expression /^(?<remoteAddress>[\w\.]+:\d+) \[(?<requestDate>[^\]]*)\] httpfront-(?<domain>[\w-.]+)~ (?<namespace>kube-system|[\w]+)-(?<service>[\w-]+)(-[\d]+)?\/[\w-]+ (?<requestReadTime>\d+)\/(?<waitTime>\d+)\/(?<backendConnectTime>\d+)\/(?<backendResponseTime>\d+)\/(?<responseTime>\d+) (?<statusCode>\d+) (?<responseBytes>\d+) (?<reqCookie>[\w-]+) (?<resCookie>[\w-]+) (?<terminationState>[\w-]+) (?<actconn>\d+)\/(?<feconn>\d+)\/(?<beconn>\d+)\/(?<srvconn>\d+)\/(?<retries>\d+) (?<srvqueue>\d+)\/(?<backendQueue>\d+) "(?<method>[A-Z]+) (?<url>[^ ]+) (?<httpVersion>[^ ]+)"$/
</pattern>
<pattern>
format regexp
# Examples:
# Connect from 172.20.59.142:13201 to 172.20.59.142:31916 (httpfront/HTTP)
# Connect from 10.0.1.2:33312 to 10.0.3.31:8012 (www/HTTP)
expression /^Connect from (?<remoteAddress>[\w\.]+:\d+) to (?<backendAddress>[\w\.]+:\d+) \((?<frontend>[\w]+)\/(?<mode>[\w]+)\)$/
</pattern>
<pattern>
format regexp
# Examples:
# Server kube-system-fluentd-http-http-input/server0002 is going DOWN for maintenance. 3 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
# Server kube-system-fluentd-http-http-input/server0001 is going DOWN for maintenance. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
expression /^Server (?<namespace>kube-system|[\w]+)-(?<service>[\w-]+)\/[\w-]+ is going DOWN for maintenance. (?<remainingActive>\d+) active and (?<remainingBackup>\d+) backup servers left. (?<activeSessions>\d+) sessions active, (?<requeued>\d+) requeued, (?<remainingInQueue>\d+) remaining in queue.$/
</pattern>
<pattern>
format regexp
# Examples:
# "10.2.2.0:60889 [06/Nov/2019:13:54:54.904] httpfront-shared-frontend/3: SSL handshake failure"
expression /^(?<remoteAddress>[\w\.]+:\d+) \[(?<requestDate>[^\]]*)\] (?<frontend>[\w-]+\/\d+): (?<msg>[\w].*)$/
</pattern>
<pattern>
format regexp
# Examples:
# Server kube-system-fluentd-http-http-input/server0003 is DOWN, reason: Layer4 connection problem, info: \"Connection refused\", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
# Server kube-system-fluentd-http-http-input/server0003 is UP, reason: Layer4 check passed, check duration: 0ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
expression /^Server (?<namespace>kube-system|[\w]+)-(?<service>[\w-]+)\/[\w-]+ is (?<status>[\w]+), reason: (?<reason>[^,]+), (info: "(?<info>[^"]+)", )?check duration: (?<checkDuration>[^.]+). (?<remainingActive>\d+) active and (?<remainingBackup>\d+) backup servers (left|online). ((?<activeSessions>\d+) sessions active, )?(?<requeued>\d+) (sessions )?requeued, (?<remainingInQueue>\d+) (remaining|total) in queue.$/
</pattern>
</parse>
</filter>
<match auditlog.kubernetes.**>
#id elasticsearch-auditlog
#type elasticsearch
#log_level info
include_tag_key true
host elastic.audit.{{ required "management Domain" .Values.mgmt_domain }}
port 443
user "#{ENV['ELASTIC_USER']}"
password "#{ENV['ELASTIC_PASSWORD']}"
scheme "https"
ssl_verify false
type_name _doc
ssl_version TLSv1_2
time_precision 3
logstash_format true
logstash_prefix auditlog
reconnect_on_error true
request_timeout 30s
bulk_message_request_threshold -1
<buffer>
#type file
path /var/log/fluentd-buffers/auditlog.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_max_interval 30
chunk_limit_size 10M
total_limit_size 10G
queued_chunks_limit_size 100
overflow_action throw_exception
</buffer>
</match>
<match partner-auditlog.kubernetes.**>
#id elasticsearch-partner-auditlog
#type elasticsearch
#log_level info
include_tag_key true
host elastic.audit.{{ required "management Domain" .Values.mgmt_domain }}
port 443
user "#{ENV['ELASTIC_USER']}"
password "#{ENV['ELASTIC_PASSWORD']}"
scheme "https"
ssl_verify false
ssl_version TLSv1_2
type_name _doc
time_precision 3
logstash_format true
logstash_prefix partner-auditlog
reconnect_on_error true
request_timeout 30s
bulk_message_request_threshold -1
<buffer>
#type file
path /var/log/fluentd-buffers/partner-auditlog.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_max_interval 30
chunk_limit_size 10M
total_limit_size 10G
queued_chunks_limit_size 100
overflow_action throw_exception
</buffer>
</match>
<match maintenance-auditlog.kubernetes.**>
#id elasticsearch-maintenance-auditlog
#type elasticsearch
#log_level info
include_tag_key true
user "#{ENV['ELASTIC_USER']}"
password "#{ENV['ELASTIC_PASSWORD']}"
scheme "https"
ssl_verify false
ssl_version TLSv1_2
type_name _doc
host elastic.audit.{{ required "management Domain" .Values.mgmt_domain }}
port 443
time_precision 3
logstash_format true
logstash_prefix maintenance-auditlog
reconnect_on_error true
request_timeout 30s
bulk_message_request_threshold -1
<buffer>
#type file
path /var/log/fluentd-buffers/maintenance-auditlog.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_max_interval 30
chunk_limit_size 10M
total_limit_size 1G
queued_chunks_limit_size 100
overflow_action throw_exception
</buffer>
</match>
<match haproxy.**>
#id elasticsearch-haproxy
#type elasticsearch
#log_level info
include_tag_key true
user "#{ENV['ELASTIC_USER']}"
password "#{ENV['ELASTIC_PASSWORD']}"
scheme "https"
type_name _doc
ssl_verify false
ssl_version TLSv1_2
host da-eck-es-http.monitoring
port 9200
time_precision 3
logstash_format true
logstash_prefix haproxy
reconnect_on_error true
request_timeout 30s
bulk_message_request_threshold -1
<buffer>
#type memory
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_max_interval 20
retry_max_times 10
chunk_limit_size 10M
total_limit_size 100M
queued_chunks_limit_size 100
overflow_action throw_exception
</buffer>
</match>
<match **>
#id elasticsearch
#type elasticsearch
#log_level info
include_tag_key true
type_name _doc
host da-eck-es-http.monitoring
port 9200
user "#{ENV['ELASTIC_USER']}"
password "#{ENV['ELASTIC_PASSWORD']}"
scheme "https"
ssl_verify false
ssl_version TLSv1_2
time_precision 3
logstash_format true
reconnect_on_error true
request_timeout 30s
bulk_message_request_threshold -1
<buffer>
#type memory
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_max_interval 30
chunk_limit_size 10M
retry_max_times 10
total_limit_size 1G
queued_chunks_limit_size 100
overflow_action throw_exception
</buffer>
</match>
</label>
Related
I'm in trouble with the parser of Fluentd not working.
I don't know where the hell the problem is.
fluentd.conf
<source>
#type tail
#id in_tail_app_logs
path /tmp/log/test/*.log
pos_file /var/log/app.log.pos
tag logging.app
refresh_interval 1
read_from_head true
<parse>
#type regexp
expression /^(?<time>.+?)\t(?<logname>.+?)\t(?<log>.+?)$/
time_key time
time_format %Y-%m-%dT%H:%M:%S%:z
</parse>
</source>
<match logging.app>
#type stdout
#type copy
<store ignore_error>
#type file
path /tmp/log/influx
</store>
<store>
#type influxdb
host 20.10.222.22
port 8086
user test
password test123
use_ssl false
dbname test
measurement test_measurement
time_precision s
auto_tags true
flush_interval 10
verify_ssl false
sequence_tag _seq
</store>
</match>
/tmp/log/test/buffer.*.log
...
2023-02-06T08:09:46+00:00 kubernetes.var.log.containers.app-test-app-test-space-fb5ffca1ec-0_cf-workloads_opi-05942675f9f0dd17039304733f228e7abd6ebdfd91609baf1a7afefaeb33ced8.log {"stream":"stdout","log":"Console output from test-node-app","docker":{"container_id":"05942675f9f0dd17039304733f228e7abd6ebdfd91609baf1a7afefaeb33ced8"},"kubernetes":{"container_name":"opi","namespace_name":"cf-workloads","pod_name":"app-test-app-test-space-fb5ffca1ec-0","pod_id":"f4f2592e-ed57-4375-aff7-40c7b214abe0","host":"ap-joy-sidecar-5","labels":{"controller-revision-hash":"app-test-app-test-space-fb5ffca1ec-5bb7dc5769","cloudfoundry_org/app_guid":"8fc07280-506c-49b2-ab00-a97222fcf0a5","cloudfoundry_org/guid":"8fc07280-506c-49b2-ab00-a97222fcf0a5","cloudfoundry_org/org_guid":"fdf0a222-33a4-46fd-a7f9-7955b9ea862c","cloudfoundry_org/org_name":"system","cloudfoundry_org/process_type":"web","cloudfoundry_org/source_type":"APP","cloudfoundry_org/space_guid":"f1b70a4b-7581-4214-b128-2f1597f7789d","cloudfoundry_org/space_name":"app-test-space","cloudfoundry_org/version":"ebfde654-e73c-48da-b55b-42a37a6ba139","security_istio_io/tlsMode":"istio","service_istio_io/canonical-name":"app-test-app-test-space-fb5ffca1ec","service_istio_io/canonical-revision":"latest","statefulset_kubernetes_io/pod-name":"app-test-app-test-space-fb5ffca1ec-0"}},"app_id":"8fc07280-506c-49b2-ab00-a97222fcf0a5","instance_id":"f4f2592e-ed57-4375-aff7-40c7b214abe0","structured_data":"[tags#47450 source_type=\"APP/PROC/WEB\"]"}
...
I think \t exists between time, logname, and log.
I tried the following methods.
What I tried
1. fluentd.conf
json
<parse>
#type json
time_format %Y-%m-%dT%H:%M:%S%:z
</parse>
tsv
<parse>
#type tsv
keys time,logname,logs
time_key time
time_format %Y-%m-%dT%H:%M:%S%:z
</parse>
none
<parse>
#type none
</parse>
2. result
Log files did not exist for all.
There is no output when using stdout parser except json parser.
What I want
I want the parser to work normally and to send data to influxdb.
Please help me...
I'm trying to push the logs my Elasticserver logs to rsys and then FLuentd. For this the stacktrace error logs should be in one line.
It was multiline before
443 [2022-08-05T07:45:38,068][ERROR][o.e.i.g.GeoIpDownloader ] [techsrv01] exception during geoip databases update
444 org.elasticsearch.ElasticsearchException: not all primary shards of [.geoip_databases] index are active
445 at org.elasticsearch.ingest.geoip.GeoIpDownloader.updateDatabases(GeoIpDownloader.java:137) ~[ingest-geoip-7.17.5.jar:7.17.5]
446 at org.elasticsearch.ingest.geoip.GeoIpDownloader.runDownloader(GeoIpDownloader.java:284) [ingest-geoip-7.17.5.jar:7.17.5]
447 at org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:100) [ingest-geoip-7.17.5.jar:7.17.5]
448 at org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:46) [ingest-geoip-7.17.5.jar:7.17.5]
449 at org.elasticsearch.persistent.NodePersistentTasksExecutor$1.doRun(NodePersistentTasksExecutor.java:42) [elasticsearch-7.17.5.jar:7.17.5]
450 at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777) [elasticsearch-7.17.5.jar:7.17.5]
451 at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.17.5.jar:7.17.5]
452 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
453 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
454 at java.lang.Thread.run(Thread.java:833) [?:?]
After changing the pattern layout in log4j2.properties in the below format. I'm able to get it into two lines. But I'm not able to convert it more into single line.
appender.rolling_old.layout.pattern =
[%d{ISO8601}][%-5p][%-25c{1.}][%node_name] %marker %m %n
%throwable{separator(|)}
2028 [2022-08-05T11:04:40,810][ERROR][o.e.i.g.GeoIpDownloader ][techsrv01] exception during geoip databases update
2029 ElasticsearchException[not all primary shards of [.geoip_databases] index are active]| at org.elasticsearch.ingest.geoip.GeoIpDownloader.updateDatabases(GeoIpDownloader.java:137)| at org.elasticsearch.ingest.geoip.GeoIpDownloader.runDownloader(GeoIpDownloader.java:284)| at org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:100)| at org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:46)| at org.elasticsearch.persistent.NodePersistentTasksExecutor$1.doRun(NodePersistentTasksExecutor.java:42)| at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777)| at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)| at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)| at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)| at java.base/java.lang.Thread.run(Thread.java:833)[2022-08-05T11:04:41,171][INFO ][o.e.c.r.a.AllocationService][techsrv01] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.ds-ilm-history-5-2022.07.18-000001][0], [.kibana-event-log-7.17.5-000001][0], [.geoip_databases][0], [.ds-.logs-deprecation.elasticsearch-default-2022.07.18-000001][0]]]).
How can we achieve this using Log4j2 layout pattern?
Instead of converting the logs in one single line using log4j2. I used the default log pattern. And I ditched RSYS instead used FluentD directly to parse the logs, below configuration will only filter warn and error and not info
td-agent.conf
<source>
#type tail
path /var/log/elasticsearch/elasticdemo.log
pos_file /var/log/elasticsearch/elasticdemo.log.pos
tag elastic_error_self
<parse>
#type multiline
format_firstline /(\d{4})-(\d\d)-(\d\d)/
format1 /^(?<timestamp>\[.*?\])(?<logLevel>\[.*?\])(?<service>\[.*?\]) (?<node_name>\[.*?\])(?<message>.*)/
</parse>
</source><filter **>
#type grep
<exclude>
key logLevel
pattern /INFO/
# or, to exclude all messages that are empty or include only white-space
</exclude>
</filter><match elastic**>
#type elasticsearch
host elasticIP/lbip/vmip #where elastic is installed
port 9200
index_name elastic_error_self
include_timestamp true #connection configs
reconnect_on_error true
reload_on_failure true
slow_flush_log_threshold 90 # buffer configs
<buffer>
#type file
path /data/opt/fluentd/buffer/elastic_error_self
chunk_limit_size 32MB
total_limit_size 20GB
flush_thread_count 8
flush_mode interval
retry_type exponential_backoff
retry_timeout 10s
retry_max_interval 30
overflow_action drop_oldest_chunk
flush_interval 5s
</buffer>
</match>
Unable to monitor Elasticsearch server logs in Kibana dashboard.
I have 2 RHEL VMs for testing. I'm using this approach since production have different architecture
VM1- Elasticsearch,Kibana,Rsyslog
VM2 - FluentD
I want to push Elasticsearch logs from VM1 pushing it using Rsyslog and then sending it to VM2 where Fluentd is installed and Fluentd should send back to VM1 Elasticsearch. Below are the configuration.
I've tried installing fluentd in elasticsearch VM and was able to see the elastic logs in kibana.
But my requirement is to use rsyslog and send it to FLuentd. Since, fluentD is not installed in ELasticsearch VM's
td-agent.conf
log_level info
worker 2
</system>
<source>
#type tcp
port 5142
bind 0.0.0.0
<parse>
#type multiline
format_firstline /^(?<date>\[.*?\])/
format1 /(?<date>\[.*?\])(?<logLevel>\[.*?\])(?<service>\[.*?\]) (?<node_name>\[.*?\]) (?<LogMessage>.*)/
</parse>
tag es_logs
</source>
<source>
#type syslog
port 5145
<transport tcp>
</transport>
bind 0.0.0.0
tag syslog
</source>
<filter es_logs**>
#type parser
format json
time_key time_msec
key_name message
reserve_data true # tells Fluentd to keep the encompasing JSON - off by default
remove_key_name_field true # removes the key of the parsed JSON: message - off by default
</filter>
<match es**>
#type elasticsearch
host vm1ip
port 9200
index_name es_logs_write
include_timestamp true
type_name fluentd
# connection configs
reconnect_on_error true
reload_on_failure true
slow_flush_log_threshold 90
# buffer configs
<buffer>
#type file
path /data/opt/fluentd/buffer/elaticsearch_logs
chunk_limit_size 2MB
total_limit_size 1GB
flush_thread_count 8
flush_mode interval
retry_type exponential_backoff
retry_timeout 10s
retry_max_interval 30
overflow_action drop_oldest_chunk
flush_interval 5s
</buffer>
</match>```
rsyslog.conf
# Sample rsyslog configuration file
#
$ModLoad imfile
$ModLoad immark
$ModLoad imtcp
$ModLoad imudp
#$ModLoad imsolaris
$ModLoad imuxsock
module(load="omelasticsearch")
template(name="es_logs" type="list" option.json="on") {
constant(value="{")
constant(value="\"#timestamp\":\"") property(name="timereported" dateFormat="rfc3339")
constant(value="\",\"host\":\"") property(name="hostname")
constant(value="\",\"severity-num\":") property(name="syslogseverity")
constant(value=",\"facility-num\":") property(name="syslogfacility")
constant(value=",\"severity\":\"") property(name="syslogseverity-text")
constant(value="\",\"facility\":\"") property(name="syslogfacility-text")
constant(value="\",\"syslogtag\":\"") property(name="syslogtag")
constant(value="\",\"message\":\"") property(name="msg")
constant(value="\"}")
}
$UDPServerRun 514
#### GLOBAL DIRECTIVES ####
# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# Where to place auxiliary files
$WorkDirectory /var/lib/rsyslog
#### RULES ####
# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.* /dev/console
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.none;mail.none;authpriv.none;cron.none;local6.none /var/log/messages
# Log auth.info separate
auth.info /var/log/authlog
# The authpriv file has restricted access.
authpriv.* /var/log/secure
# Log all the mail messages in one place.
mail.* -/var/log/maillog
# Log cron stuff
cron.* /var/log/cron
# Everybody gets emergency messages
*.emerg :omusrmsg:*
# Save news errors of level crit and higher in a special file.
uucp,news.crit /var/log/spooler
# Save boot messages also to boot.log
local7.* /var/log/boot.log
# ### begin forwarding rule ###
# The statement between the begin ... end define a SINGLE forwarding
# rule. They belong together, do NOT split them. If you create multiple
# forwarding rules, duplicate the whole block!
# Remote Logging (we use TCP for reliable delivery)
#
# An on-disk queue is created for this action. If the remote host is
# down, messages are spooled to disk and sent when it is up again.
$ActionQueueFileName fwdRule1 # unique name prefix for spool files
$ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible)
$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
$ActionQueueType LinkedList # run asynchronously
$ActionResumeRetryCount -1 # infinite retries if host is down
$MaxMessageSize 64k
# remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional
# Forward output to Fluentd
#local8.* /data/elastic_logs/elasticdemo.log
*.* #Vm1Ip:5142;es_logs
I use the below configurations, created a new file /etc/rsyslog.d/11-elastic.conf
For rsys:
$ModLoad imfile
$InputFilePollInterval 1
$InputFileName /var/log/elasticsearch/elasticdemo.log
$InputFileTag eslogs:
$InputFileStateFile eslogs
$InputFileFacility local0
$InputRunFileMonitor
:syslogtag, isequal, "eslogs:" {
:msg, contains, "ERROR" {
local0.* /var/log/eslog_error.log
local0.* #fluentdVMip:5141
}
stop
}
For FluentD
td-agent.conf
<system>
worker 2
</system>
<source>
#type udp
port 5141
tag eslogs
<parse>
#type multiline
format_firstline /^\[(?<date>.*?)\]/
format1 /\[(?<date>.*?)\]\[(?<logLevel>.*?)\]\[(?<service>.*?)\] \[(?<node_name>.*?)\](?<LogMessage>.*)/
</parse>
</source>
<match system.**>
#type stdout
</match>
<match eslogs.**>
#type elasticsearch
host ipoftheelasticserver or domain name
port 9200
index_name es_logs_write
include_timestamp true
type_name fluentd
# connection configs
reconnect_on_error true
reload_on_failure true
slow_flush_log_threshold 90
# buffer configs
<buffer>
#type file
path /data/opt/fluentd/buffer/elaticsearch_logs
chunk_limit_size 2MB
total_limit_size 1GB
flush_thread_count 8
flush_mode interval
retry_type exponential_backoff
retry_timeout 10s
retry_max_interval 30
overflow_action drop_oldest_chunk
flush_interval 5s
</buffer>
</match>
I'm using fluentd in my kubernetes cluster to collect logs from the pods and send them to the elasticseach.
Once a day or two the fluetnd gets the error:
[warn]: #0 emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error=“buffer space has too many data” location=“/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.7.4/lib/fluent/plugin/buffer.rb:265:in `write’”
And the fluentd stops sending logs, until I reset the fluentd pod.
How can I avoid getting this error?
Maybe I need to change something in my configuration?
<match filter.Logs.**.System**>
#type elasticsearch
host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME']}"
user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
logstash_format true
logstash_prefix system
type_name systemlog
time_key_format %Y-%m-%dT%H:%M:%S.%NZ
time_key time
log_es_400_reason true
<buffer>
flush_thread_count "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_THREAD_COUNT'] || '8'}"
flush_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_INTERVAL'] || '5s'}"
chunk_limit_size "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_CHUNK_LIMIT_SIZE'] || '8M'}"
queue_limit_length "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_QUEUE_LIMIT_LENGTH'] || '32'}"
retry_max_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_RETRY_MAX_INTERVAL'] || '30'}"
retry_forever true
</buffer>
</match>
The default buffer type is memory check: https://github.com/uken/fluent-plugin-elasticsearch/blob/master/lib/fluent/plugin/out_elasticsearch.rb#L63
There are two disadvantages to this type of buffer
- if the pod or containers are restarted logs that in the buffer will be lost.
- if all the RAM allocated to the fluentd is consumed logs will not be sent anymore
Try to use file-based buffers with the below configurations
<buffer>
#type file
path /fluentd/log/elastic-buffer
flush_thread_count 8
flush_interval 1s
chunk_limit_size 32M
queue_limit_length 4
flush_mode interval
retry_max_interval 30
retry_forever true
</buffer>
I am trying to send events from TD agent to ElasticSearch and Kibana but it is not working
My Td agent conf:
<source>
type tail
path /var/log/abc.log
pos_file /etc/td-agent/def.pos
refresh_interval 10s
tag "abc.def"
format /^(?<Time>[^ ]* [^ ]*) (?<Logging_Level>\[(.*)\]) (?<PID>\ [(.*)\]) \[\-\:\-\] (?<Message>(.*))$/
time_format %Y-%m-%d %H:%M:%S
</source>
<filter "abc.def">
type record_transformer
<record>
hostname "#{Socket.gethostname}"
</record>
</filter>
<match "abc.def">
type elasticsearch
logstash_format true
host xyz.def.domain
port 9200 #(optional; default=9200)
flush_interval 10s
index_name logstash #(optional; default=fluentd)
</match>
Not sure why it is not sending hostname from TD agent to ElasticSearch and Kibana?
You should enable ruby in record_transformer, this is ruby expression:
"#{Socket.gethostname}"
So it should look like this:
<filter "abc.def">
type record_transformer
enable_ruby true
<record>
hostname "#{Socket.gethostname}"
</record>
</filter>