Multi-line pattern in FileBeat - elasticsearch

I am using Filbeat for log aggregation, which takes the logs to Kibana. Below is my error message that needs to be directed to Kibana:
2017-04-17 15:45:47,154 [JCO.ServerThread-8] ERROR com.webservice.AxisWebServiceClient - Client error
2017-04-17 15:45:47,154 [JCO.ServerThread-8] ERROR com.webservice.AxisWebServiceClient - The XML request is invalid. Fix the request and resend.
310,273,990
310,292,500
360,616,489
2017-04-04 12:47:09,362 [JCO.ServerThread-3] INFO com.app.Listener - End RFC_CALCULATE_TAXES_DOC
2017-04-04 12:47:09,362 [JCO.ServerThread-3] DEBUG com.Time - RFC_CALCULATE_TAXES_DOC,DEC[2],Total Time,39
i want only to have 2017-04-17 15:45:47,154 [JCO.ServerThread-8]ERROR and lines below the error to be send to Kibana, but i do get the INFO part as well
Below is filbeat.yml file
filebeat:
prospectors:
-
paths:
- /apps/global/vertex/SIC_HOME_XEC/logs/sic.log
input_type: log
exclude_lines: ['^INFO']
#include_lines: 'ERROR'
multiline:
pattern: '^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s\[[A-Za-z0-9.-]*\]\s[E]RROR'
negate: true
match: after
Request veterans help to select only the ERROR message pattern using regex.

In order to extract the error messages as a group, you'll need to modify your regex as following:
^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s\[[A-Za-z0-9.-]*\]\sERROR (\w.+)
Explanation:
(\w.+)
This creates a group with all characters and the dot character, which captures the error message.

Related

Transform String into JSON so that it's searchable in Kibana/Elasticsearch

I have Elasticsearch, Filebeat and Kibana running on a Windows machine. Filebeat log has a proper log file and is listening to the path. When I look on the data in Kibana it looks fine.
My issue is that the message field is a String.
Example of one log line:
12:58:09.9608 Trace {"message":"No more Excel rows found","level":"Trace","logType":"User","timeStamp":"2020-08-14T12:58:09.9608349+02:00","fingerprint":"226fdd2-e56a-4af4-a7ff-724a1a0fea24","windowsIdentity":"mine","machineName":"NAME-PC","processName":"name","processVersion":"1.0.0.1","jobId":"957ef018-0a14-49d2-8c95-2754479bb8dd","robotName":"NAME-PC","machineId":6,"organizationUnitId":1,"fileName":"GetTransactionData"}
So what I would like to have now is that String converted to a JSON so that it is possible to search in Kibana for example for the level field.
I already had a look on Filebeat. There I tried to enable LogStash . But then the data does not come anymore to Elasticsearch. And also the log file is not genereated into the LogStash folder.
Then I downloaded LogStash via install guide, but unfortunately I got this message:
C:\Users\name\Desktop\logstash-7.8.1\bin>logstash.bat
Sending
Logstash logs to C:/Users/mine/Desktop/logstash-7.8.1/logs which
is now configured via log4j2.properties ERROR: Pipelines YAML file is
empty. Location:
C:/Users/mine/Desktop/logstash-7.8.1/config/pipelines.yml usage:
bin/logstash -f CONFIG_PATH [-t] [-r] [] [-w COUNT] [-l LOG]
bin/logstash --modules MODULE_NAME [-M
"MODULE_NAME.var.PLUGIN_TYPE.PLUGIN_NAME.VARIABLE_NAME=VALUE"] [-t]
[-w COUNT] [-l LOG] bin/logstash -e CONFIG_STR [-t] [--log.level
fatal|error|warn|info|debug|trace] [-w COUNT] [-l LOG] bin/logstash
-i SHELL [--log.level fatal|error|warn|info|debug|trace] bin/logstash -V [--log.level fatal|error|warn|info|debug|trace]
bin/logstash --help
[2020-08-14T15:07:51,696][ERROR][org.logstash.Logstash ]
java.lang.IllegalStateException: Logstash stopped processing because
of an error: (SystemExit) exit
Edit:
I tried to use Filebeat only. Here I set:
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
- dissect:
tokenizer: '"%{event_time} %{loglevel} %{json_message}"'
field: "message"
target_prefix: "dissect"
- decode_json_fields:
fields: ["json_message"]
but that gave me:
dissect_parsing_error
The tip with removing the "" at tokenizer helped. Then I got:
I simply refreshed the index and the message was gone. Nice.
But The question is now, how to filter for something in the new field?
The message says, your pipeline config is empty. It seems you did not configured any pipeline yet. Logstash can do the trick (JSON filter plugin), but Filebeat is sufficient here. If you don't want to introduce another Service, this is the better option.
It has the decode_json_fields option to transform specific fields containing JSON in your event to a . Here is the documentation.
For the future case, where your whole event is a JSON, there is the possibility of parsing in filebeat configuring the json.message_key and related json.* option.
EDIT - Added filebeat snippet as an processors example of dissecting the log line into three fields (event_time, loglevel, json_message). Afterwards the recently extracted field json_message, whose value is a JSON object encoded as a string, will be decoded into an JSON structure:
...
filebeat.inputs:
- type: log
paths:
- path to your logfile
processors:
- dissect:
tokenizer: '%{event_time} %{loglevel} %{json_message}'
field: "message"
target_prefix: "dissect"
- decode_json_fields:
fields: ["dissect.json_message"]
target: ""
- drop_fields:
fields: ["dissect.json_message"]
...
If you want to practice the filebeat processors, try to set the correct event timestamp, taken from the encoded json and written into #timestamp using the timestamp processor.

Can filebeat dissect a log line with spaces?

So I have a log line formatted as such:
2020-04-15 12:16:44,936 WARN c.e.d.c.p.p.BasePooledObjectFactory [main] Caution - XML schema validation has been disabled! Validation is only available when using XML.
I am using filebeat to send this directly to elasticsearch, which it does but the log.level is not set, the whole line becomes the message.
reading up on dissection I had intended to use:
processors:
- add_host_metadata: ~
- dissect:
tokenizer: "%{} %{} %{log.level} %{} [%{}] %{message}"
field: "message"
target_prefix: ""
which I expected to split into:
{
log.level: WARN
message: Caution - XML schema validation has been disabled! Validation is only available when using XML.
}
instead I get the same output as without the dissect:
{
message: 2020-04-15 12:16:44,936 WARN c.e.d.c.p.p.BasePooledObjectFactory [main] Caution - XML schema validation has been disabled! Validation is only available when using XML.
}
I'm just getting to grips with filebeat and I've tried looking through the documentation which made it look simple enough. however my dissect is currently not doing anything. host metadata is being added so I believe that the processors are being called.
How can I get the log level out of the log line? (preferably without changing the format of the log itself)
You need to pick another field name than message in the dissect tokenization since this is the name of the field that contains the original log message:
processors:
- add_host_metadata: ~
- dissect:
tokenizer: "%{} %{} %{log.level} %{} [%{}] %{msg}"
field: "message"
target_prefix: ""

Unable to access data inside alert section of elastalert

I have been trying to set up elastalert monitoring on my ELK stack. For the beginning I want to set up a simple rule which will generate a notification if any disk on the file system has reached 80% usage. The rule seems to be working correctly but in the alert section I am not able to pass the data to python script. The uncommented command in the alert section gives following error
ERROR:root:Error while running alert command: Error formatting command: 'system.filesystem.mount_point' error.
Here is my rule file. Please excuse the formatting of the yaml.
name: Metricbeat high FS percentage
type: metric_aggregation
es_host: localhost
es_port: 9200
index: metricbeat-*
buffer_time:
minutes: 1
metric_agg_key: system.filesystem.used.pct
metric_agg_type: max
query_key: beat.name.keyword
doc_type: metricsets
bucket_interval:
minutes: 1
realert:
minutes: 2
sync_bucket_interval: true
#allow_buffer_time_overlap: true
#use_run_every_query_size: true
max_threshold: 0.8
filter:
- query:
query_string:
query: "system.filesystem.device_name: dev"
analyze_wildcard: true
- term:
metricset.name: filesystem
# (Required)
# The alert is use when a match is found
alert:
- debug
- command
command: ["/home/ubuntu/sendToSlack.py","beat-name","%(beat.name.keyword)s","used_pc","%(system.filesystem.used.pct_max)s","mount_point","%(system.filesystem.mount_point)s"]
# command: ["/home/ubuntu/sendToSlack.py","--beat-name","{match[beat.name.keyword]}","--mount_point","{match[system.filesystem.mount_point]}"]
# command: ["/home/ubuntu/sendToSlack.py","--beat-name","{match[beat][name]}","--mount_point","{match[system][filesystem][mount_point]}"]
#pipe_match_json: true
#- command:
# command: ["/home/ubuntu/sendToSlack.py","%(system.filesystem.used.bytes)s"]
Some observations:
On testing the rule file using the command python -m elastalert.test_rule rules/high_fs.yaml I get the output
Successfully loaded Metricbeat high FS percentage
Got 149161 hits from the last 1 day
Available terms in first hit:
tags
beat.hostname
beat.name
beat.version
type
#timestamp
system.filesystem.available
system.filesystem.files
system.filesystem.mount_point
system.filesystem.free_files
system.filesystem.free
system.filesystem.device_name
system.filesystem.used.bytes
system.filesystem.used.pct
system.filesystem.total
host
#version
metricset.rtt
metricset.name
metricset.module
I should be able to access any of the fields mentioned above. When I run this rule using python -m elastalert.elastalert --verbose --rule rules/high_fs.yaml a list is printed on the screen
#timestamp: 2017-10-18T17:15:00Z
beat.name.keyword: my_server_name
num_hits: 98
num_matches: 5
system.filesystem.used.pct_max: 0.823400020599
I am able to access all the key value pairs in this list. Anything thats outside the list fails with the formatting error. Been stuck over this for long. Any help is appreciated.
UPDATE: A reply for the same problem on elastalert's github repo says that certain query types do not contain the full field data.
While I am not sure if this is the correct way to achieve what I was looking for but I was able to get the the desired output using the rule type any and writing my own filters. Here is how one of my rules file looks currently.
name: High CPU percentage
type: any
es_host: localhost
es_port: 9200
index: consumer-*
query_key:
- beat.name
filter:
- range:
system.cpu.total_norm_pct:
from: 0.95
to: 10.0
realert:
minutes: 60
alert:
- command:
command: ["/home/ubuntu/slackScripts/sendCPUDetails.py","{match[beat][name]}","{match[system][cpu][total_norm_pct]}"]
new_style_string_format: true
Hope it helps someone.

FileBeat Service is not starting due to yml configuration

This is my filebeat.yml file …
i am getting error :1053 whenever i am starting filebeat service.
may be some mistake i am doing in this file, please correct me where i am wrong.
###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.full.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
#=========================== Filebeat prospectors =============================
filebeat.prospectors:
# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.
# Paths that should be crawled and fetched. Glob based paths.
paths:
- E:\ELK-STACK\logstash-tutorial-dataset.log
input_type: log
document_type: apachelogs
# document_type: apachelogs
#paths:
# - E:\ELK-STACK\mylogs.log
#fields: {log_type: mypersonal-logs}
#- C:\Logs\GatewayService\GatewayService-Processor.Transactions-20170810
# - C:\ECLIPSE WORKSPACE\jcgA1\jcgA1\logs-logstash.*
# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
#exclude_lines: ["^DBG"]
# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ["^ERR", "^WARN"]
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: [".gz$"]
# Optional additional fields. These field can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1
### Multiline options
# Mutiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
#multiline.pattern: ^\[
# Defines if the pattern set under pattern should be negated or not. Default is false.
#multiline.negate: false
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
#multiline.match: after
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#================================ Outputs =====================================
# Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.
#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
# Array of hosts to connect to.
# hosts: ["localhost:9200"]
# Optional protocol and basic auth credentials.
#protocol: "https"
#username: "elastic"
#password: "changeme"
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["localhost:5043"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Logging =====================================
# Sets log level. The default log level is info.
# Available log levels are: critical, error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
Actually what i am trying to do is, i am trying to use multiple logs specifying "document_type", if i remove "document_type" then it works, but why "document_type"(as i see this depcreated in filebeat 5.5) or "fields" is not working in it.
please help.
You have a syntax error in your config file.
The filebeat.prospectors keys wants an array value, but you are passing it a hash instead.
Plus, you have indentation problems.
This is a corrected version of your config file (without comments for brevity)
filebeat.prospectors:
-
paths:
- E:\ELK-STACK\logstash-tutorial-dataset.log
input_type: log
document_type: apachelogs
output.logstash:
hosts: ["localhost:5043"]

sematext logagent debugging patterns

I have installed sematext logagent https://sematext.github.io/logagent-js/installation/
Configured it to output to elasticsearch and all is good but one thing which i spent this all day trying to do.
There is 0, null, none information on how to debug parsers. I start logagent with "logagent --config logagent.yml -v -j", yml file bellow
options:
printStats: 30
# don't write parsed logs to stdout
suppress: false
# Enable/disable GeoIP lookups
# Startup of logagent might be slower, when downloading the GeoIP database
geoipEnabled: false
# Directory to store Logagent status nad temporary files
diskBufferDir: ./tmp
input:
files:
- '/var/log/messages'
- '/var/log/test'
patterns:
sourceName: !!js/regexp /test/
match:
- type: mysyslog
regex: !!js/regexp /([a-z]){2}(.*)/
fields: [message,severity]
dateFormat: MMM DD HH:mm:ss
output:
elasticsearch:.
module: elasticsearch
url: http://host:9200
index: mysyslog
stdout: yaml # use 'pretty' for pretty json and 'ldjson' for line delimited json (default)
I would expect (based on the scares documentation) that this would split each line of test file into 2, example 'ggff', 'gg' would be message, 'ff' would be severity, but all i can see in my kibana is that 'ggff' is a message and severity is defaulted (?) to info. The problem is, i dont know where the problem is. Does it skip my pattern, does match in my pattern fail ? any help would be VERY appreciated.
Setting 'debug: true' in patterns.yml prints detailed info about matched patterns.
https://github.com/sematext/logagent-js/blob/master/patterns.yml#L36
Watch Logagent issue #69 (https://github.com/sematext/logagent-js/issues/69) for additional improvements.
The docs moved to http://sematext.com/docs/logagent/ . I recommend www.regex101.com to test regular expressions (please use JavaScript regex syntax).
Examples of Syslog messages in /var/log are in the default pattern library:
https://github.com/sematext/logagent-js/blob/master/patterns.yml#L498

Resources