Logstash gets only one result from input - elasticsearch

I have a problem with Logstash. It only matches the first occurence. For example. Im reciving data like this:
2015.01.01
2017.05.02
2015.08.03
2011.10.24
2010.02.25
And i have filter:
filter{
grok {
match => { "message" => "(?<started>%{YEAR}.%{MONTHNUM}.%{MONTHDAY})" }
}
I want have grab all dates and save it somewhere. But in the output i have only one result:
{
"message":"2015.01.01\n2017.05.02\n2015.08.03\n2011.10.24\n2010.02.25",
"host":"127.0.0.1",
"started":"2015.01.01",
}
How to tell logstash/grok to get all dates?
Thanks!

How is your input configuration and how are you sending the data over http?
When you set the input as a file it works because it reads the file line by line and each line only have one date.
When you send it using http it seems to be sending the whole file in one line, with the line breaks as \n, but everything is just only one event to logstash, that's why it only matchs the first date it finds.

Related

How to remove first few lines of CSV in Logstash

This is the input I am using for logstash.
ItemId,AssetId,ItemName,Comment
11111,07,ABCDa,XYZa
11112,07,ABCDb,XYZb
11113,07,ABCDc,XYZc
11114,07,ABCDd,XYZd
11115,07,ABCDe,XYZe
11116,07,ABCDf,XYZf
11117,07,ABCDg,XYZg
Date,Time,Mill Sec,rows,columns
19-05-2020,13:03:46,534,2,2
19-05-2020,13:03:46,539,2,2
19-05-2020,13:03:46,544,2,2
19-05-2020,13:03:46,549,2,2
19-05-2020,13:03:46,554,2,2
I need to remove first 8 lines from the csv and make the next line as column header and parse rest of lines as usual. Is there a way to do that in logstash?
You could do this using the file input and then read it line by line using grok to make sure it has the right amount of fields comma separated and ignore the header one
Your input will look like this:
input {
file {
path => "/path/to/my.csv"
start_position => beginning
}
}
This will read each line into an event with the data in the field named message and then send it to your filters.
In your filter you'll use grok with a pattern like this:
filter {
grok {
match => { "message" => [
"^%{DATE:Date},%{TIME:Time},%{NUMBER:Mill_Sec},%{NUMBER:rows},%{NUMBER:colums}$"
]
}
}
}
This will present each line as an event looking like this:
{
"colums": "2",
"Time": "13:03:46",
"Mill_Sec": "554",
"rows": "2",
"Date": "19-05-2020"
}
You can use mutate to remove unwanted fields (like message) prior to going to your output part. If there is no match with the pattern defined you'll get a tag with the value _grokparsefailure in your tags, you can use that to decide to send it to your output or not. As you defined that it has to be numbers, it will also fail on the header one and thus leave you with only 'real' events.
This can be done by having your output defined like this:
output {
if "_grokparsefailure" not in [tags] {
elasticsearch {
...
}
}
}
You should do this before the file gets to Logstash. There are ways to do it within Logstash, for example by using a mutliline code then doing exotic grok matches to remove the first N lines (or removing lines until a particular regex), then doing a split followed by a plain ol' csv filter. You need to be even more careful than usual with header rows. It's a big mess.
Much better to put something in front of Logstash to handle this issue.
If the files are local to your logstash instance, you could use the Exec input plugin to deal with the irregularities.
input {
exec {
command => "/path/to/command_or_script" # sh or py or js etc
interval => 60
}
}
On Linux, this command will print a file from the 8th line on...
command => "tail +8 /path/to/file"
This one (again for Linux) will drop everything until a line that starts with date, and print everything after that
command => "sed -n -e '/^date/,$p' /path/to/file"
You can avoid read the same file over and over again by deleting or archiving it in a script (rather than a one-liner as used in these examples)
After trimming the unwanted leading lines, you should be able to use the csv filter in a normal way.
Note that if you want to autodetect_column_names that pipeline workers must be set to 1.
Your content is not CSV format. Your task is convert it to true CSV format.

How to get logs and it's data having word "error" in then and how to configure logstashPipeLine.conf file for the same?

Currently I am working on an application where I need to create documents from particular data from a file at specific location. I have set up logstash pipeline configuration.
Here is what it looks like currently:
input{
file{
path => "D:\ELK_Info\logstashInput.log"
start_position => "beginning"
}
}
#Possible IF condition here in the filter
output {
#Possible IF condition here
http {
url => "http://localhost:9200/<index_name>/<type_name>"
http_method => "post"
format => "json"
}
}
I want to provide IF condition in output before calling API.
The condition should be like, "If data from input contains word 'Error', only then proceed further to call http API mentioned."
Any idea on how may I do the same?
Please look at this link: Ignore and move to next pattern if log contains a specific word
The first step is to look whehther input has error as key word, if so, continue the parsing by second grok. If no, just forget the input.

Not able to parse string to date in logstash/elasticSearch

I had created a logstash script to read a logfile which is having various timestamp of format "2018-05-08T12:18:53.506+0530". I am trying to parse it to date using the date filter in log stash
date{
match => ["edrTimestamp","yyyy-MM-dd'T'HH:mm:ss.SSS'Z'","ISO8601"]
target => "edrTimestamp"
}
The running the above logstash script it creates a elastic search index. But still the string is not parsed to date. It is also showing date parsed exception in the index.
It creates output like this.
{
"tags": [
"_dateparsefailure"
],
"statusCode": "805",
"campaignRedemptionLimitTotal": 1000,
"edrTimestamp": "2018-05-22T16:41:25.162+0530 ",
"msisdn": "+919066231327",
"timestamp": "2018-05-22T16:41:25.122+0530",
"redempKeyword": "print1",
"campaignId": "C910101-1527004962-1582",
"category": "RedeemRequestReceived"
}
Please tell me whats wrong in the above code> I had tried many others alternative but still it is not working.
Your issue is that your timestamp has a space at the end of it "edrTimestamp": "2018-05-22T16:41:25.162+0530 ", which is causing the date parsing to fail. You need to add a:
mutate {
strip => "edrTimestamp"
}
before your date filter.
I don't think you should be escaping the Z. So you probably want something like:
yyyy-MM-dd'T'HH:mm:ss,SSS
Also you should not be using "Z" since your time is not Zulu (0 offset). You will want to contain the offset as part of the pattern. The Heroku grok debug app is useful for this.
If I pass your string
2018-05-08T12:18:53.506+0530
and use the filter %{TIMESTAMP_ISO8601} then it matches, this pattern is made up of the following sub-patterns:
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?

Logstash - Parsing the JSON

I have a JSON log that comes to Logstash.
It looks like this:
[{"orderNumber":"xxxxxxxx","externalOrderNumber":"07efc4c7d3113453427f7fe525e22a61e","operation":{"name":"CAPTURE","amount":990,"status":"PENDING","createdAt":"2011-05-11T04:58:21.187Z","updatedAt":{}}}]
[{"paymentMethod":"Card","transactionId":"331d83fd-2456-48320-842a-f4122aa311e1","orderStatus":"SUCCESS","statuses":[{"amount":990,"operation":"CAPTURE","status":"SUCCESS","createdAt":"2012-05-11T04:58:26.252Z"},{"amount":990,"operation":"CAPTURE","status":"PENDING","createdAt":"2012-05-11T04:58:26.224Z"},{"amount":990,"operation":"AUTHORISE","status":"SUCCESS","createdAt":"2012-05-11T04:58:26.198Z"},{"amount":990,"operation":"AUTHORISE","status":"PENDING","createdAt":"2012-05-11T04:58:16.304Z"}]}]
Which is basically [{SOMEJSON}][{MOREJSON}]
I was thinking to do a pattern where I grab the first "[" then I do %{GREEDYDATA:firstjson} until the ]. The simply repeat the procedure.
I was not successfull in making this happen. I am stuck at the start with the '['
I have tried this Logstash:
grok {
match => { "message" => "\[%{GREEDYDATA:firstjson}\]%{SPACE} \[%{GREEDYDATA:second}\]"}
}
But it only grabs the first JSON.

How to extract CPU Usage details from the log file in logstash

I am trying to extract the CPU usage and timestamp from the message:
2015-04-27T11:54:45.036Z| vmx| HIST ide1 IRQ 4414 42902 [ 250 - 375 ) count: 2 (0.00%) min/avg/max: 250/278.50/307
I am using logstash and here is my logstash.config file:
input {
file {
path => "/home/xyz/Downloads/vmware.log"
start_position => beginning
}
}
filter {
grok{
match => ["message", "%{#timestamp}"]
}
}
output{
stdout {
codec => rubydebug
}
}
But its giving me grok parse error, Any help would really be appreciated. Thanks.
As per the message from Magnus, you're using the grok match function incorrectly, #timestamp is the name of a system field that logstash uses as the timestamp the message was recieved at, not the name of a grok pattern.
First I recommend you have a look at some of the default grok patterns you can use which can be found here, then I also recommend you use the grok debugger finally, if all else fails, get yourself in the #logstash irc channel (on freenode), we're pretty active in there, so I'm sure someone will help you out.
Just to help you out a bit further, this is a quick grok pattern I have created which should match your example (I only used the grok debugger to test this, so results in production might not be perfect - so test it!)
filter {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601}\|\ %{WORD}\|\ %{GREEDYDATA}\ min/avg/max:\ %{NUMBER:minimum}/%{NUMBER:average}/%{NUMBER:maximum}" ]
}
}
To explain slightly, %{TIMESTAMP_ISO8601} is a default grok pattern which matches the timestamp in your example.
You will notice the use of \ quite a lot, as the characters following this need to be escaped (because we're using a regex engine and spaces, pipes etc have a meaning, by escaping them we disable that meaning and use them literally).
I have used the %{GREEDYDATA} pattern as this will capture anything, this can be useful when you just want to capture the rest of the message, if you put it at the end of the grok pattern it will capture all remaining text.
I have then taken a bit from your example (min/avg/max) to stop the GREEDYDATA from capturing the rest of the message, as we want the data after that.
%{NUMBER} will capture numbers, obviously, but the bit after the : inside the curly braces defines the name that field will be given by logstash and subsequently saved in elasticsearch.
I hope that helps!

Resources