Logstash Filter on Specific Values - elasticsearch

I'm relatively new to Logstash but have been successful up to this point. I'm parsing through a log and viewing the output in Kibana.
What I'd like to do is output only the data that I'm interested in. This includes data where the source = linux and the number = 78 or 80.
I'm trying to use the drop{} function to do this by trying to remove anything that does not meet these conditions. If the source is not equal to linux and the number is not 78 or 80, then drop it. Logic tells me this would only send what I want in the output but I'm not having any luck. It works great for one or the other (just filtering on the source or just on the numbers) but when I try to do both, it only takes the first condition. I've tried a few different ways: nested if statements, separate if statements, using !=, not in, etc.
Below is my code (notice the conditional in the filter):
input {
file {
path => "/home/user/logs/os_log.csv"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}\,\"?%{NUMBER:number}\s*<%{NUMBER:carrotnumber}>%{SYSLOGTIMESTAMP:syslogtimestamp}\s*%{WORD:object}\s*%{USERNAME:source}\s*%{GREEDYDATA:event}" }
}
if [source] != "linux" and [number] not in ["78","80"] {
drop {}
}
}
output {
elasticsearch { host => localhost }
}
Is there a better way to do this? Thanks!

Feels like you meant:
if [source] != "linux" or [number] not in ["78","80"] {
drop {}
}

Related

Generate multiple types with multiple csv

I am trying to generate various types in the same index based on various csv. As I donĀ“t know the amount of csv, making an input for each one would be non-viable.
So does anyone know how to generate types with the names of the files and in those, introduce the csv respectively?
input {
file {
path => "/home/user/Documents/data/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
skip_header => "true"
autodetect_column_names => true
autogenerate_column_names => true
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "final_index"
}
stdout {}
}
Thank you so much
Having multiple document structures in the same index has been removed in Elasticsearch indices since version 6, if a document is not looking the same way as the index is templated it will not be able to send the data to it, what you can do is make sure that all fields are known and you have one general template containing all possible fields.
Is there a reason why you want all of it in one index?
If it is for querying purposes or Kibana, do know you can wildcard when searching and have patterns for Kibana.
Update after your comment:
Use a filter to extract the filename using grok
filter {
grok {
match => ["path","%{GREEDYDATA}/%{GREEDYDATA:filename}\.csv"]
}
}
And use the filename in your output like this:
elasticsearch {
hosts => "http://localhost:9200"
index => "final_index-%{[filename]}"
}

Drop filter not working logstash

I have multiple log messages in a file which I am processing using logstash filter plugins. Then, the filtered logs are getting sent to elasticsearch.
There is one field called addID in a log message. I want to drop all the log messages which have a particular addID present. These particular addIDS are present in a ID.yml file.
Scenario: If the addID of a log message matches with any of the addIDs present in the ID.yml file, that log message should be dropped.
Could anyone help me in achieving this?
Below is my config file.
input {
file {
path => "/Users/jshaw/logs/access_logs.logs
ignore_older => 0
}
}
filter {
grok {
patterns_dir => ["/Users/jshaw/patterns"]
match => ["message", "%{TIMESTAMP:Timestamp}+{IP:ClientIP}+{URI:Uri}"]
}
kv{
field_split => "&?"
include_keys => [ "addID" ]
allow_duplicate_values => "false"
}
if [addID] in "/Users/jshaw/addID.yml" {
drop{}
}
}
output {
elasticsearch{
hosts => ["localhost:9200"]
}
}
You are using the in operator wrong. It is used to check if a value is in an array, not in a file, which are usually a bit more complicated to use.
A solution would be to use the ruby filter to open the file each time.
Or put the addId value in your configuration file, like this:
if [addID] == "addID" {
drop{}
}

Can we extract numeric values from string through script in kibana?

I need to extract numeric values from string and store in new field..Can we do this through scripted field?
Ex: 1 hello 3 test
I need to extract 1 and 3.
You can do this through logstash if you are using elasticsearch.
Run a logstash process with a config like
input {
elasticsearch {
hosts => "your_host"
index => "your_index"
query => "{ "query": { "match_all": {} } }"
}
}
filter {
grok {
match => { "your_string_field" => "%{NUMBER:num1} %{GREEDYDATA:middle_stuff} %{NUMBER:num2} %{GREEDYDATA:other_stuff}" }
}
mutate {
remove_field => ["middle_stuff", "other_stuff"]
}
}
output{
elasticsearch {
host => "yourhost"
index => "your index"
document_id => %{id}
}
}
This would essentially overwrite each document in your index with two more fields, num1 and num2 that correspond to the numbers that you are looking for. This is just a quick and dirty approach that would take up more memory, but would allow you to do all of the break up at one time instead of at visualization time.
I am sure there is a way to do this with scripting, look into groovy regex matching where you return a specific group.
Also no guarantee my config representation is correct as I don't have time to test it at the moment.
Have a good day!

Make logstash add different inputs to different indices

I have setup logstash to use an embedded elastisearch.
I can log events.
My logstash conf looks thus:
https://gist.github.com/khebbie/42d72d212cf3727a03a0
Now I would like to add another udp input and have that input be indexed in another index.
Is that somehow possible?
I would do it to make reporting easier, so I could have system log events in one index, and business log events in another index.
Use an if conditional in your output section, based on e.g. the message type or whatever message field is significant to the choice of index.
input {
udp {
...
type => "foo"
}
file {
...
type => "bar"
}
}
output {
if [type] == "foo" {
elasticsearch {
...
index => "foo-index"
}
} else {
elasticsearch {
...
index => "bar-index"
}
}
}
Or, if the message type can go straight into the index name you can have a single output declaration:
elasticsearch {
...
index => "%{type}-index"
}

Way to populate Logstash output variable without getting it from an Input?

Is there another way to tell Logstash to supply a value to an output variable without pulling it from a Logstash input? For example, in my case I'd like to create an Elasticsearch index based on a performance run ID (which I'd do from an external script) and then have Logstash send to that. For now I was thinking of creating a tcp input just for receiving perf run info and then have a filter to match on the run id. Seems like a convoluted way to do this though. For example:
input {
tcp {
type => "perfinfo"
port => 8888
}
}
if [type] == "perfinfo" {
do some matching to extract the id
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure if setting manage_template to false would actually be necessary. I've read that it is.
Update
Thanks Nirdesh for that. Using Ruby might be very handy.
While I was waiting I tried using a grok filter like so:
grok {
match => { "message" => "%{WORD:perftype}-%{POSINT:perfid}" }
}
Which produced this stdout during debugging:
{
"message" => "awperf-14",
"#version" => "1",
"#timestamp" => "2014-10-17T20:01:19.758Z",
"host" => "0:0:0:0:0:0:0:1:33361",
"type" => "perfinfo",
"perftype" => "awperf",
"perfid" => "14"
}
Which I tried creating an index based on this like so:
index => "%{perftype}-%{perfid}"
So when I passed 'awperf-14' to the input, I ended up creating these indexes
%{perftype}-%{perfid}
awperf-14
Which is not what I was expecting. Also, it's the %{perftype}-%{perfid} index that starts to be populated, not awperf-14, the one I actually wanted.
Yes.
You can add any no. of your own variables either for intermediate result or for permanent using a property called add_field. All most all filters in logstash support this property.
So, for your soluation, you can use a ruby script to find out the id dynamically and store it in a new variable called id, which you can use it in output.
For Example :
input {
tcp {
type => "perfinfo"
port => 8888
}
}
filter{
if [type] == "perfinfo" {
ruby{
//do some processing
add_field => { "id" => "Some value" }
}
}
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure I can do what I was trying to do via Logstash. To be a clearer, I simply wanted to change the index based on the performance run ID I'm executing. There's nothing in the data that would have this information (I have to pull it from a DB). So instead of trying to have Logstash listen for a performance run ID, I scripted this externally. The script uses the Elasticsearch API to create a new index, and then does a string replace for the index in the Logstash config file. It then restarts Logstash, which normally happens between performance runs anyway. This approach was much easier to do, and seems cleaner.

Resources