I am using Filebeat to send a CSV file to Logstash and then up to Kibana, however I am getting a parsing error when the CSV file is picked up by Logstash.
This is the contents of the CSV file:
time version id score type
May 6, 2020 # 11:29:59.863 1 2 PPy_6XEBuZH417wO9uVe _doc
The logstash.conf:
input {
beats {
port => 5044
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","index","score","type"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
}
Filebeat.yml:
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /etc/test/*.csv
#- c:\programdata\elasticsearch\logs\*
and the error in Logstash:
[2020-05-27T12:28:14,585][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"time,version,id,score,type,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
[2020-05-27T12:28:14,586][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"\"May 6, 2020 # 11:29:59.863\",1,2,PPy_6XEBuZH417wO9uVe,_doc,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
I do get some data in Kibana but not what I want to see.
I have managed to get it to work locally. the mistakes I have noticed so far were:
Using ES reserved fields like #timestamp, #version, and more.
The timestamp was not in ISO8601 format. It had an # sign in the middle.
Your filter set the separator to , but your CSV real separator is "\t".
According to the error you can see it is trying to also work on your titles line, I suggest you remove it from the CSV or use the skip_header option.
Below is the logstash.conf file I used:
input {
file {
path => "C:/work/elastic/logstash-6.5.0/config/test.csv"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","score","type"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "csv-test"
}
}
The CSV file I used:
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
From my Kibana:
I use Logstash input http_handler to collect metrics from different endpoints.
For each endpoint I have separate config file with "input" plugin like:
input { http_poller {urls => { server_1 => { url => 'http://10.200.3.1:8809/metrics' } } request_timeout => 5 tags => 'TL.QA.proxy-service' interval => 60 metadata_target => 'http_poller_metadata' type => 'tl_qa_http_metrics'}}
I have ~1000 such files in one directory.
When I start Logstash I specify directory to read all those files, like:
./bin/logstash -f /opt/logstash-5.6.2/configs/
When I had small amount of files (~100) it works pretty good. But now looks like Logstash doesn't have enough time to read all files and it doesn't collect data from all endpoints.
Can you please advise how I can improve it?
I have 2 linux boxes setup in which 1 box contains one component which generates log and logstash installed in it to transfer the logs. And in other box I have redis elasticsearch and logstash. here logstash will act as logstash indexer to grok the data.
Now my problem is that in 1st box component generate new log file everyday, but only difference in log file name varies as per date.
like
counters-20151120-0.log
counters-20151121-0.log
counters-20151122-0.log
and so on, I have included below type of code in my logstash shipper conf file:
file {
path => "/opt/data/logs/counters-%{YEAR}%{MONTHNUM}%{MONTHDAY}*.log"
type => "rg_counters"
}
And in my logstash indexer, I have below type of code to catch those log files:
if [type] == "rg_counters" {
grok{
match => ["message", "%{YEAR}%{MONTHNUM}%{MONTHDAY}\s*%{HOUR}:%{MINUTE}:%{SECOND}\s*(?<counters_raw_data>[0-9\-A-Z]*)\s*(?<counters_operation_type>[\-A-Z]*)\s*%{GREEDYDATA:counters_extradata}"]
}
}
output {
elasticsearch { host => ["elastichost1","elastichost1" ] port => "9200" protocol => "http" }
stdout { codec => rubydebug }
}
Please note that this is working setup and other types log files are getting transfered and processed successfully, so there is no issue of setup.
The problem is how do I process this log file which contains date in it's file name.
Any help here?
Thanks in advance!!
Based on the comments...
Instead of trying to use regexp patterns in your path:
path => "/opt/data/logs/counters-%{YEAR}%{MONTHNUM}%{MONTHDAY}*.log"
just use glob patterns:
path => "/opt/data/logs/counters-*.log"
logstash will remember which files (inodes) that it's seen before.
when trying to load a file into elastic, using logstash that is running the config file below, I get the following output msgs on elastic and no file is loaded (when input is configured to be stdin everything seems to be working just fine)
[2014-08-20 10:51:10,957][INFO ][cluster.service ] [Max] added {[logsta
sh-GURWB02038-5480-4002][dstQagpWTfGkSU5Ya-sUcQ][GURWB02038][inet[/10.203.152.13
9:9301]]{client=true, data=false},}, reason: zen-disco-receive(join from node[[l
ogstash-GURWB02038-5480-4002][dstQagpWTfGkSU5Ya-sUcQ][GURWB02038][inet[/10.203.1
52.139:9301]]{client=true, data=false}])
Logstash Config File that I used is below:-
input {
file {
path => "D:/example.log"
}
}
output {
elasticsearch {
host => "localhost"
}
}
You might be missing start_position.
Try with something like this.
input {
file {
path => "D:/example.log"
start_position => "beginning"
}
}
Also take the "first contact" restriction into account, according to the documentation.
start_position
Value can be any of: "beginning", "end"
Default value is "end"
Choose where Logstash starts initially reading files: at the beginning or at the end.
The default behavior treats files like live streams and thus starts at the end.
If you have old data you want to import, set this to ‘beginning’
This option only modifies “first contact” situations where a file is new and not seen
before. If a file has already been seen before, this option has no effect.
Hope this helps.
From all the examples it seems that the syntext is:
output {
elasticsearch {
host => localhost
}
}
I am trying to parse a folder with logstash on windows but I have some strange results.
I have a file samplea.log with the following content :
1
2
3
4
5
I have a file sampleb.log with the following content:
6
7
8
9
10
This is my config file:
input {
file {
path=> "C:/monitoring/samples/*.log"
}
}
output {
stdout { debug => "true" }
elasticsearch {
protocol => "transport"
host => "127.0.0.1"
}
}
For unknown reasons, the events displayed on my console are 6,7,8,9 and the same events are stored in elasticsearch. The last line of my file sampleb is ignored and the whole file samplea is ignored.
Thanks in advance for your help.
To solve this problem, I've patched original logstash (1.4.2) distribution with the filewatch-0.6.1 gem (original is filewatch-0.5.1)
Logstash is delimiting lines with return character. Make sure you hit enter on the last line.
Known bug, the inner library tracks the file with inode. On Windows the library use a function that always return 0. In your case, I think logstash reads the simpleb.log, then reads simplea.log from index 6, which is end-of-file.
Bug tracked here : https://github.com/logstash-plugins/logstash-input-file/issues/2