I am using Filebeat to send a CSV file to Logstash and then up to Kibana, however I am getting a parsing error when the CSV file is picked up by Logstash.
This is the contents of the CSV file:
time version id score type
May 6, 2020 # 11:29:59.863 1 2 PPy_6XEBuZH417wO9uVe _doc
The logstash.conf:
input {
beats {
port => 5044
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","index","score","type"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
}
Filebeat.yml:
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /etc/test/*.csv
#- c:\programdata\elasticsearch\logs\*
and the error in Logstash:
[2020-05-27T12:28:14,585][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"time,version,id,score,type,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
[2020-05-27T12:28:14,586][WARN ][logstash.filters.csv ][main] Error parsing csv {:field=>"message", :source=>"\"May 6, 2020 # 11:29:59.863\",1,2,PPy_6XEBuZH417wO9uVe,_doc,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
I do get some data in Kibana but not what I want to see.
I have managed to get it to work locally. the mistakes I have noticed so far were:
Using ES reserved fields like #timestamp, #version, and more.
The timestamp was not in ISO8601 format. It had an # sign in the middle.
Your filter set the separator to , but your CSV real separator is "\t".
According to the error you can see it is trying to also work on your titles line, I suggest you remove it from the CSV or use the skip_header option.
Below is the logstash.conf file I used:
input {
file {
path => "C:/work/elastic/logstash-6.5.0/config/test.csv"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns =>["time","version","id","score","type"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "csv-test"
}
}
The CSV file I used:
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
From my Kibana:
I use Logstash input http_handler to collect metrics from different endpoints.
For each endpoint I have separate config file with "input" plugin like:
input { http_poller {urls => { server_1 => { url => 'http://10.200.3.1:8809/metrics' } } request_timeout => 5 tags => 'TL.QA.proxy-service' interval => 60 metadata_target => 'http_poller_metadata' type => 'tl_qa_http_metrics'}}
I have ~1000 such files in one directory.
When I start Logstash I specify directory to read all those files, like:
./bin/logstash -f /opt/logstash-5.6.2/configs/
When I had small amount of files (~100) it works pretty good. But now looks like Logstash doesn't have enough time to read all files and it doesn't collect data from all endpoints.
Can you please advise how I can improve it?
I have a Logstash machine running in AWS. In Logstash I have 3 config files each having 1 input defined on them. These inputs are reading logs from following sources
From s3
From http input
From filebeat
The problem is that I am getting duplicate messages in Kibana. So for 1 message generated by Filebeat I am seeing 3 messages in Kibana. I tried to remove 1 config file and the count got reduced to 2. So I am pretty sure that this is due to these config files.
What is confusing me is that why this is happening. I have separate input defined on all 3 config files, still getting duplicate messages. These are the input section of all 3 config files.
s3 input
input {
s3 {
bucket => "elb-logs"
region => "us-east-1"
prefix => "demo/AWSLogs/792177735214/"
type => "elb-logs"
delete => true
}
}
Http input
input {
http {
type => "frontend-logs"
codec => "json"
}
}
Filebeat
input {
beats {
port => "5043"
}
}
For all 3 config files there is common output section i.e.
output {
elasticsearch { hosts => [ "10.0.0.1:9200" ] }
}
Logstash will concatenate the three config files together (s3 input, Http input, Filebeat) and see three output sections.
The three output sections are not related to the specific inputs - instead Logstash will send an input from any one of the three sources to all of the configured outputs. As a result your message will be output three times to the same destination.
I would create a separate, single output config file and remove the output section from your 3 input config files.
Currently
I have completed the above task by using one log file and passes data with logstash to one index in elasticsearch :
yellow open logstash-2016.10.19 5 1 1000807 0 364.8mb 364.8mb
What I actually want to do
If i have the following logs files which are named according to Year,Month and Date
MyLog-2016-10-16.log
MyLog-2016-10-17.log
MyLog-2016-10-18.log
MyLog-2016-11-05.log
MyLog-2016-11-02.log
MyLog-2016-11-03.log
I would like to tell logstash to read by Year,Month and Date and create the following indexes :
yellow open MyLog-2016-10-16.log
yellow open MyLog-2016-10-17.log
yellow open MyLog-2016-10-18.log
yellow open MyLog-2016-11-05.log
yellow open MyLog-2016-11-02.log
yellow open MyLog-2016-11-03.log
Please could I have some guidance as to how do i need to go about doing this ?
Thanks You
It is also simple as that :
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "MyLog-%{+YYYY-MM-DD}.log"
}
}
If the lines in the file contain datetime info, you should be using the date{} filter to set #timestamp from that value. If you do this, you can use the output format that #Renaud provided, "MyLog-%{+YYYY.MM.dd}".
If the lines don't contain the datetime info, you can use the input's path for your index name, e.g. "%{path}". To get just the basename of the path:
mutate {
gsub => [ "path", ".*/", "" ]
}
wont this configuration in output section be sufficient for your purpose ??
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
index => "syslog-%{+YYYY.MM.dd}"
}
}
when trying to load a file into elastic, using logstash that is running the config file below, I get the following output msgs on elastic and no file is loaded (when input is configured to be stdin everything seems to be working just fine)
[2014-08-20 10:51:10,957][INFO ][cluster.service ] [Max] added {[logsta
sh-GURWB02038-5480-4002][dstQagpWTfGkSU5Ya-sUcQ][GURWB02038][inet[/10.203.152.13
9:9301]]{client=true, data=false},}, reason: zen-disco-receive(join from node[[l
ogstash-GURWB02038-5480-4002][dstQagpWTfGkSU5Ya-sUcQ][GURWB02038][inet[/10.203.1
52.139:9301]]{client=true, data=false}])
Logstash Config File that I used is below:-
input {
file {
path => "D:/example.log"
}
}
output {
elasticsearch {
host => "localhost"
}
}
You might be missing start_position.
Try with something like this.
input {
file {
path => "D:/example.log"
start_position => "beginning"
}
}
Also take the "first contact" restriction into account, according to the documentation.
start_position
Value can be any of: "beginning", "end"
Default value is "end"
Choose where Logstash starts initially reading files: at the beginning or at the end.
The default behavior treats files like live streams and thus starts at the end.
If you have old data you want to import, set this to ‘beginning’
This option only modifies “first contact” situations where a file is new and not seen
before. If a file has already been seen before, this option has no effect.
Hope this helps.
From all the examples it seems that the syntext is:
output {
elasticsearch {
host => localhost
}
}