Logstash - Can choose not to forward a log based on data? - elasticsearch

Just learning how to use Logstash - goddamn there's a lot to learn on this :D
In my setup, I have CEF data being sent to my logstash.
Some cef events are just "statistic" information about the tool that is sending the cef events.
I want logstash to NOT send on these events. Is that possible?
Here is some psuedo code of what I think it would look like.
input {
udp {
port => 9001
codec => cef
}
filter {
if 'stat_heading' contains "Statistic Information" do not forward to elasticsearch
}
output {
elasticsearch {
host => ["192.168.0.20:9200"]
}
Could someone point me in the correct direction?
Edit
Okay - So i see the Filter does have an optional for IF Conditions. I'm going to read into this more, and when i get a working solution I'll post it.
Edit
got it working. Added solution in comments below.

I think you can try drop plugin to skip some data if it gets to filter
https://www.elastic.co/guide/en/logstash/current/plugins-filters-drop.html

Okay I've found my own answer to this.
You need to add in the conditional If statements, and if an event value matches some value then drop the event.
input {
udp {
port => 9001
codec => cef
}
filter {
if "Some string here" in [myheader] {
drop {}
}
output {
elasticsearch {
host => ["192.168.0.20:9200"]
}

Related

Collecting logs from different remote servers using just Logstash

Is it possible to send logs from different remote machines to elasticsearch using just logstash(no filebeats)? Is so, do I define same index in all the conf.d file in all the machines? I want all the logs to be in the same index.
Would i use logs-%{+YYYY.MM.dd} for the index of all config files to have them indexed into the same folder?
input {
file {
part => /home/ubuntu/logs/data.log
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index =>"logs-%{+YYYY.MM.dd}"
}
}
What you do is ok and it will work. Just one thing I would correct is that you should simply write to a data stream and not have to care about the index name and ILM matters (rollover, retention, etc), like this:
input {
file {
part => /home/ubuntu/logs/data.log
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "ubuntu"
data_stream_namespace => "prod"
}
}
The data stream name will be logs-ubuntu-prod, you can change the latter two to your liking.
Make sure to properly set up your data stream first, with an adequate Index Lifecycle Management policy, though.
On a different note, it's a waste of resource to install Logstash on all your remote machines which is supposed to work as centralized streaming engine. You should definitely either use Filebeat, or even better now the Elastic Agent which is fully manageable through Fleet in Kibana. You should have a look.

ELK - Filtering data with Logstash

I am experimenting with ELK stack, and so far so good. I have small issue that I am trying to resolve.
I have a field named 'message' coming from filebeat. Inside that field is a string with data for logging.
Sometimes that message field might contain this line:
successfully saved with IP address: [142.93.111.8] user: [testuser#some.com]
I would like to apply a filter, so the logstash send this to the Elastic Search:
successfully saved with IP address: [] user: [testuser#some.com]
This is what I currently have in Logstash configuration:
input {
beats {
port => "5043"
codec => json
}
}
filter {
if [message] =~ /IP address:/{
mutate { add_tag => "whats happening" }
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
}
}
Something else cought my attention. ELK is able to do text filtering on Filebeat level and also on Logstash level. Which one is the most usual scenario? Is Filebeat filtering more suitable?
I have found the correct solution in my case:
mutate {
gsub => ["message", "address: \[(.*?)]", "address:[not indexable]"]
}
Hopefully someone will find it usefull.

Logstash data showing up as "message" field in elasticsearch

I am trying to send some raw data to elasticsearch through logstash. I am trying to do this through the udp plugin but for now I dont think this is relevant.
Basically, I with to send key/value pairs, and I wish for this to show up as:
{
"key_1": "value_1"
....
}
instead of:
{
"message": "{\"key1\": \"value1\"}"
}
Is there any way for logstash to somehow "decode" the message as json and insert them as top level keys?
Thanks
I just needed to use a "json" codec on the input like so:
input {
udp {
port => 3425
codec => "json"
}
}
Thanks to Val for pointing this out

How to generate reports on existing dump of logs using ELK?

Using ELK stack, is it possible to generate reports on existing dump of logs?
For example:
I have some 2 GB of Apache access logs and I want to have the dashboard reports showing:
All requests, with status code 400
All requests, with pattern like "GET http://example.com/abc/.*"
Appreciate, any example links.
Yes, it is possible. You should:
Install and setup the ELK stack.
Install filebeat, configure it to harvest your logs, and to forward the data to logstash.
In logstash, listen to filebeat input, use the grok to process/break up your data, and forward it to elastichsearch something like:
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{COMMONAPACHELOG}" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "filebeat-logstash-%{+YYYY.MM.dd}"
}
}
In kibana, setup your indices, and query for data, e.g.
response: 400
verb: GET AND message: "http://example.com/abc/"

data from rabbitmq not being read into kibana dashboard

I just altered my logstash-elasticearch setup to include rabbitmq rather since I wasn't able to get messages into logstash fast enough with tcp connection. Now it is blazing fast as logstash reads from the queue but I do not see the messages coming through into kibana. One error shows the timestamp field missing. I used the plugin/head to view the data and it is odd:
_index _type _id ▼_score #version #timestamp
pt-index logs Bv4Kp7tbSuy8YyNi7NEEdg 1 1 2014-03-27T12:37:29.641Z
this is what my conf file looks like now and below what it did look like:
input {
rabbitmq {
queue => "logstash_queueII"
host => "xxx.xxx.x.xxx"
exchange => "logstash.dataII"
vhost => "/myhost"
}
}
output {
elasticsearch{
host => "xxx.xxx.xx.xxx"
index => "pt-index"
codec => "json_lines"
}
}
this is what it was before rabbitmq:
input {
tcp {
codec => "json_lines"
port => "1516"
}
}
output {
elasticsearch {
embedded => "true"
}
}
Now the only change I made was to create a specific index in elasticsearch and have the data indexed there but now it seems the format of the message has changed. It is still json messages with 2/3 fields but not sure what logstash is reading or changing from rabbitmq. I can see data flowing into the histogram but the fields are gone.
"2014-03-18T14:32:02" "2014-03-18T14:36:24" "166" "google"
these are the fields I would expect. Like I said all this worked before I made the change.
I have seen examples of a similar configurations, but they do not use the output codec of "json_lines" going into Elasticsearch. The output codec would adjust the formatting of the data as it leaves logstash which I do not believe is nessisary. Try deleting the codec and see what logstash is outputting by adding a file output to a log, be sure this is only short sample...

Resources