Configure logstash to take updated values - elasticsearch

I have connected logstash, Elasticsearch and Kibana. It all works fine.
I used logstash to take the tomcat logs.
input {
file {
path => "/tom_logs/*"
type => "tomcat"
start_position => "end"
}
}
Once i updated the log file, It takes the whole logs in the file instead of updated log. I just want to load the log which is last updated.
Any one help me.
Thanks in advance

Your problem is a bit strange because I never experienced it. To be sure that I understand correctly : when a new log comes, logstash start analysing again all the logs in the file ?
You correctly specify the start_position=>"end" which is actually the default option. In this case, logstash must consider only new changes in the file (so, new logs) since its start-up.
So, I think the issue of this "bug" is not in logstash but in "how" tomcat writes logs... But if I were you, I'd try to specify path=>"tom_logs/*.log" instead of * only.
Hope it will help.

Related

Many Logstash instances reading from Redis

I have one Logstash process running inside one node consuming from a Redis list, but I'm afraid that just one process cannot handle the data throughput without a great delay.
I was wondering if I run one more process for Logstash inside this same machine will perform a little better, but I'm not certain about that. I know that my ES index is not a bottleneck.
Would Logstash duplicate my data, if I consume the same list? This approach seems to be a right thing to do?
Thanks!
Here my input configuration:
input {
redis {
data_type => "list"
batch_count => 300
key => "flight_pricing_stats"
host => "my-redis-host"
}
}
You could try adjusting logstash input threads, if you are going to run another logstash process in the same machine. Default is 1.
input {
redis {
data_type => "list"
batch_count => 300
key => "flight_pricing_stats"
host => "my-redis-host"
threads => 2
}
}
You could run more than one logstash against the same redis, events should not get duplicated. But I'm not sure that would help.
If you're not certain whats going on, I recommend the logstash monitoring API. It can help you narrow down your real bottlenck.
And also an interesting post from elastic on the subject: Logstash Lines Introducing a benchmarking tool for Logstash

Filebeat duplicating events

I am running a basic elk stack setup using Filebeat > logstash > elasticsearch > kibana - all on version 5.2
When I remove Filebeat and configure logstash to look directly at a file, it ingests the correct number of events.
If I delete the data and re-ingest the file using Filebeat to pass the same log file contents to logstash, I get over 10% more events created. I have checked a number of these to confirm the duplicates are being created by filebeat.
Has anyone seen this issue? or have any suggestions why this would happen?
I need to understand first what do you mean by removing file beat!!
Possibility-1
if you have uninstalled and installed again, then obviously file beat will read the data from the path again(which you have re-ingested and post it to logstash->elasticsearch->kibana(assuming old data is not been removed from elastic node) hence the duplicates.
Possibility-2.
You just have stopped filebeat,configured for logstash and restarted filebeat and may be your registry file is not been updated properly during shutdown(as you know,file beat reads line by line and update the registry file upto what line it has successfully published to logstash/elasticsearch/kafka etc and if any of those output servers face any difficulty processing huge load of input coming from filebeat then filebeat waits until those servers are available for further processing of input data.Once those output servers are available,filebeat reads the registry file and scan upto what line it has published and starts publishing next line onwards).
Sample registry file will be like
{
"source": "/var/log/sample/sample.log",
"offset": 88,
"FileStateOS": {
"inode": 243271678,
"device": 51714
},
"timestamp": "2017-02-03T06:22:36.688837822-05:00",
"ttl": -2
}
As you can see, it maintains timestamp in the registry file.
So this is one of the reasons for duplicates.
For further references, you can follow below links
https://discuss.elastic.co/t/filebeat-sending-old-logs-on-restart/46189
https://discuss.elastic.co/t/deleting-filebeat-registry-file/46112
https://discuss.elastic.co/t/filebeat-stop-cleaning-registry/58902
Hope that helps.

How to add dynamic hosts in Elasticsearch and logstash

I have prototype working for me with Devices sending logs and then logstash parsing it and putting into elasticsearch.
Logstash output code :-
output{
if [type] == "json" {
elasticsearch {
hosts => ["host1:9200","host2:9200","host3:9200"]
index => "index-metrics-%{+xxxx.ww}"
}
}
}
Now My Question is :
I will be producing this solution. For simplicity assume that I have one Cluster and I have right now 5 nodes inside that cluster.
So I know I can give array of 5 nodes IP / Hostname in elasticsearch output plugin and then it will round robin to distribute data.
How can I avoid putting all my node IP / hostnames into logstash config file.
As system goes into production I don't want to manually go into each logstash instance and update these hosts.
What are the best practices one should follow in this case ?
My requirement is :
I want to run my ES cluster and I want to add / remove / update any number of node at any time. I need all of my logstash instances send data irrespective of changes at ES side.
Thanks.
If you want to add/remove/update you will need to run sed or some kind of string replacement before the service startup. Logstash configs are "compiled" and cannot be changed that way.
hosts => [$HOSTS]
...
$ HOSTS="\"host1:9200\",\"host2:9200\""
$ sed "s/\$HOSTS/$HOSTS/g" $config
Your other option is to use environment variables for the dynamic portion, but that won't allow you to use a dynamic amount of hosts.

timezone incorrect in logstash / ELK / elasticsearch

I am new to elastic search and have spent a long time trying to solve the question below. Perhaps the solution should be in the documentation - but it is not :-(
I have servers running in multiple time zones.
The log files get rsynced into servers with different time zones, but it is easy to know the origin time zone either by a timezone field e.g. {"timezone": "UTC"} or by the time format itself e.g. {"#timestamp": "2015-02-20T12:11:56.789Z"}
I have full control over the log files and can adapt them if necessary.
When using logstash - it changes the time format to the local time of the server that it is running on. e.g. "#timestamp" => "2015-02-21T22:26:24.920-08:00"
How can I get the timezone consistently taken from the source log file, through log stash and into elasticsearch? (Obviously - I will want to have it in Kibana after that). I have tried many things with no success.
Thanks in advance.
My goal was to create _id in elasticsearch that has the logging time in it - so that it will never be repeated even if the log is sent again through logstash
After throwing a few more hours at the problem - I have some conclusions that as far as I am concerned are not well enough documented, and recommended work around.
1) If the format of the log file has time zone in it - there is nothing that can be done to modify it in logstash. Therefore - don't waste time on timezones or partial matching or adding timezone. If the time has a Z at the end - then it will be GMT. I think that it is a bug that when this happens - no warning is issued.
2) Logstash outputs to standard output / file with the time in its local time regardless of the format of the input string.
3) Logstash uses the time in its local time - so concatenating the time into a variable gets messed up - even if the original string was GMT. so just don't even try to work with the #timestamp variable !!!
4) elastic search works in GMT - so it behaves properly. So what you see in the output of logstash as "#timestamp" => "2015-02-21T20:26:24.921-08:00" gets properly interpreted by elastic search as "#timestamp" => "2015-02-21T12:26:24.921Z"
So my work around is as follows:
1) keep the logs with a timestamp that is NOT #timestamp
2) consistently save time in the log files as GMT and mark them with trailing Z
3) use the date filter in its most basic form. No timezone attribute
filter {
date {
match => ["log_time", "YYYY-MM-dd'T'HH:mm:ss.SSSZ"]
#timezone => "Etc/GMT-8" <--- THIS DOES NOT WORK IF THERE IS A Z IN SOURCE
}
}
4) create time derivatives straight from the log variable - not from the #timestamp. e.g.
output {
stdout { codec => rubydebug }
elasticsearch {
host => localhost
document_id => "%{log_time}-%{host}" # <--- DO THIS
# document_id => "%{#timestamp}-%{host}" <--- DON'T DO THIS
}
}
If Jordan Sissel happens to read this - I believe that logstash should be consistent with elasticsearch as a default - or at least have an option to output and work internally in GMT. I had a rocky start doing what every one goes through when trying out the tool for the 1st time with existing logs.

can graylog2 output to flat file and elasticsearch at the same time?

I'm very new to graylog2. I finally have it up and running, storing logs to elasticsearch. My question is: can graylog2 also dump to flat file? if so can it dump log files to both flat file and elastic search simultaneously? I can't seem to find the answer googling. If any log guru knows the answer, would u kindly point me to a right direction?
thank you!
you can send syslog messages to standard rsyslog port udp/514 and then from rsyslog to graylog2.
Rsyslog.conf
if $fromhost-ip == '10.10.205.1' then /var/log/hosts/host1.log
if $fromhost-ip == '10.10.205.1' then #0.0.0.0:515

Resources