Cannot parse multiple files with logstash input "file" - windows

I am trying to parse a folder with logstash on windows but I have some strange results.
I have a file samplea.log with the following content :
1
2
3
4
5
I have a file sampleb.log with the following content:
6
7
8
9
10
This is my config file:
input {
file {
path=> "C:/monitoring/samples/*.log"
}
}
output {
stdout { debug => "true" }
elasticsearch {
protocol => "transport"
host => "127.0.0.1"
}
}
For unknown reasons, the events displayed on my console are 6,7,8,9 and the same events are stored in elasticsearch. The last line of my file sampleb is ignored and the whole file samplea is ignored.
Thanks in advance for your help.

To solve this problem, I've patched original logstash (1.4.2) distribution with the filewatch-0.6.1 gem (original is filewatch-0.5.1)

Logstash is delimiting lines with return character. Make sure you hit enter on the last line.

Known bug, the inner library tracks the file with inode. On Windows the library use a function that always return 0. In your case, I think logstash reads the simpleb.log, then reads simplea.log from index 6, which is end-of-file.
Bug tracked here : https://github.com/logstash-plugins/logstash-input-file/issues/2

Related

Mutliple config files causing duplicate message

I have a Logstash machine running in AWS. In Logstash I have 3 config files each having 1 input defined on them. These inputs are reading logs from following sources
From s3
From http input
From filebeat
The problem is that I am getting duplicate messages in Kibana. So for 1 message generated by Filebeat I am seeing 3 messages in Kibana. I tried to remove 1 config file and the count got reduced to 2. So I am pretty sure that this is due to these config files.
What is confusing me is that why this is happening. I have separate input defined on all 3 config files, still getting duplicate messages. These are the input section of all 3 config files.
s3 input
input {
s3 {
bucket => "elb-logs"
region => "us-east-1"
prefix => "demo/AWSLogs/792177735214/"
type => "elb-logs"
delete => true
}
}
Http input
input {
http {
type => "frontend-logs"
codec => "json"
}
}
Filebeat
input {
beats {
port => "5043"
}
}
For all 3 config files there is common output section i.e.
output {
elasticsearch { hosts => [ "10.0.0.1:9200" ] }
}
Logstash will concatenate the three config files together (s3 input, Http input, Filebeat) and see three output sections.
The three output sections are not related to the specific inputs - instead Logstash will send an input from any one of the three sources to all of the configured outputs. As a result your message will be output three times to the same destination.
I would create a separate, single output config file and remove the output section from your 3 input config files.

Create a new index in elasticsearch for each log file by date

Currently
I have completed the above task by using one log file and passes data with logstash to one index in elasticsearch :
yellow open logstash-2016.10.19 5 1 1000807 0 364.8mb 364.8mb
What I actually want to do
If i have the following logs files which are named according to Year,Month and Date
MyLog-2016-10-16.log
MyLog-2016-10-17.log
MyLog-2016-10-18.log
MyLog-2016-11-05.log
MyLog-2016-11-02.log
MyLog-2016-11-03.log
I would like to tell logstash to read by Year,Month and Date and create the following indexes :
yellow open MyLog-2016-10-16.log
yellow open MyLog-2016-10-17.log
yellow open MyLog-2016-10-18.log
yellow open MyLog-2016-11-05.log
yellow open MyLog-2016-11-02.log
yellow open MyLog-2016-11-03.log
Please could I have some guidance as to how do i need to go about doing this ?
Thanks You
It is also simple as that :
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "MyLog-%{+YYYY-MM-DD}.log"
}
}
If the lines in the file contain datetime info, you should be using the date{} filter to set #timestamp from that value. If you do this, you can use the output format that #Renaud provided, "MyLog-%{+YYYY.MM.dd}".
If the lines don't contain the datetime info, you can use the input's path for your index name, e.g. "%{path}". To get just the basename of the path:
mutate {
gsub => [ "path", ".*/", "" ]
}
wont this configuration in output section be sufficient for your purpose ??
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
index => "syslog-%{+YYYY.MM.dd}"
}
}

Issue in reading log file that contains date in it's name

I have 2 linux boxes setup in which 1 box contains one component which generates log and logstash installed in it to transfer the logs. And in other box I have redis elasticsearch and logstash. here logstash will act as logstash indexer to grok the data.
Now my problem is that in 1st box component generate new log file everyday, but only difference in log file name varies as per date.
like
counters-20151120-0.log
counters-20151121-0.log
counters-20151122-0.log
and so on, I have included below type of code in my logstash shipper conf file:
file {
path => "/opt/data/logs/counters-%{YEAR}%{MONTHNUM}%{MONTHDAY}*.log"
type => "rg_counters"
}
And in my logstash indexer, I have below type of code to catch those log files:
if [type] == "rg_counters" {
grok{
match => ["message", "%{YEAR}%{MONTHNUM}%{MONTHDAY}\s*%{HOUR}:%{MINUTE}:%{SECOND}\s*(?<counters_raw_data>[0-9\-A-Z]*)\s*(?<counters_operation_type>[\-A-Z]*)\s*%{GREEDYDATA:counters_extradata}"]
}
}
output {
elasticsearch { host => ["elastichost1","elastichost1" ] port => "9200" protocol => "http" }
stdout { codec => rubydebug }
}
Please note that this is working setup and other types log files are getting transfered and processed successfully, so there is no issue of setup.
The problem is how do I process this log file which contains date in it's file name.
Any help here?
Thanks in advance!!
Based on the comments...
Instead of trying to use regexp patterns in your path:
path => "/opt/data/logs/counters-%{YEAR}%{MONTHNUM}%{MONTHDAY}*.log"
just use glob patterns:
path => "/opt/data/logs/counters-*.log"
logstash will remember which files (inodes) that it's seen before.

Cannot load index to elasticsearch from external file, using logstash 1.4.2 on Windows 7

when trying to load a file into elastic, using logstash that is running the config file below, I get the following output msgs on elastic and no file is loaded (when input is configured to be stdin everything seems to be working just fine)
[2014-08-20 10:51:10,957][INFO ][cluster.service ] [Max] added {[logsta
sh-GURWB02038-5480-4002][dstQagpWTfGkSU5Ya-sUcQ][GURWB02038][inet[/10.203.152.13
9:9301]]{client=true, data=false},}, reason: zen-disco-receive(join from node[[l
ogstash-GURWB02038-5480-4002][dstQagpWTfGkSU5Ya-sUcQ][GURWB02038][inet[/10.203.1
52.139:9301]]{client=true, data=false}])
Logstash Config File that I used is below:-
input {
file {
path => "D:/example.log"
}
}
output {
elasticsearch {
host => "localhost"
}
}
You might be missing start_position.
Try with something like this.
input {
file {
path => "D:/example.log"
start_position => "beginning"
}
}
Also take the "first contact" restriction into account, according to the documentation.
start_position
Value can be any of: "beginning", "end"
Default value is "end"
Choose where Logstash starts initially reading files: at the beginning or at the end.
The default behavior treats files like live streams and thus starts at the end.
If you have old data you want to import, set this to ‘beginning’
This option only modifies “first contact” situations where a file is new and not seen
before. If a file has already been seen before, this option has no effect.
Hope this helps.
From all the examples it seems that the syntext is:
output {
elasticsearch {
host => localhost
}
}

Logstash: Attaching to previous line using multiline attaches somewhere else

I have a filter that looks like so:
multiline {
pattern => "(^.+Exception.*)|(^\tat .+)"
negate => false
what => "previous"
}
But for some reason, it's not attaching to the previous line for lines with ^\tat. Sometimes it does, but most of the time it doesn't. It attaches to the line way far back. I don't see anything wrong with my code.
Does anyone know if this is a bug?
Edit: This worked properly just now but couple minutes after it doesn't work again. Is it a buffer overflow? How would I debug this?
Edit: Example of success:
2014-06-20 09:09:07,989 http-bio-8080-exec-629 WARN com.rubiconproject.rfm.adserver.filter.impl.PriorityFilter - Request : NBA_DIV=Zedge_Tier1_App_MPBTAG_320x50_ROS_Android&NBA_APPID=4E51A330AD7A0131112022000A93D4E6&NBA_PUBID=111657&NBA_LOCATION_LAT=&NBA_LOCATION_LNG=&NBA_KV=device_id_sha-1_key=5040e46d15bd2f37b3ba58860cc94c1308c0ca4b&_v=2_0_0&id=84472439740784460, Response : Unable to Score Ads.. Selecting first one and Continuing...
java.lang.IndexOutOfBoundsException: Index: 8, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:604)
at java.util.ArrayList.get(ArrayList.java:382)
Edit: Example of failure:
2014-06-20 09:02:31,139 http-bio-8080-exec-579 WARN com.rubiconproject.rfm.adserver.web.AdRequestController - Request : car=vodafone UK&con=0&model=iPhone&bdl=com.racingpost.general&sup=adm,dfp,iAd&id=8226846&mak=Apple&sze=320x50&TYP=1&rtyp=json&app=F99D88D0FDEC01300BF5123139244773&clt=MBS_iOS_SDK_2.4.0&dpr=2.000000&apver=10.4&osver=7.1&udid=115FC62F-D4FF-44E0-8D92-5A060043EFDD&pub=111407&tud=3&osn=iPhone OS&, Response : No Ad Selected to Serve..Exiting
at java.util.ArrayList.get(ArrayList.java:382)
My file has 13000+ lines, and when it errors, it attaches to couple hundred lines back. But strangely each attaches to a line with the exact same offset in between (by offset I mean those couple hundred lines that it skips).
Your logs is java stack logs.
You can try to use this pattern. Use the date as the pattern, which is the beginning of each log.
input {
stdin{}
}
filter {
multiline {
pattern => "^(?>\d\d){1,2}-(?:0?[1-9]|1[0-2])-(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])"
what => "previous"
}
}
output {
stdout {
codec => "rubydebug"
}
}
This pattern parses the date, if the line do not start with date, logstash will multiline it.
I have try it with your logs, it's worked on both two logs.
Hope this can help you.

Resources