Collecting logs from different remote servers using just Logstash - elasticsearch

Is it possible to send logs from different remote machines to elasticsearch using just logstash(no filebeats)? Is so, do I define same index in all the conf.d file in all the machines? I want all the logs to be in the same index.
Would i use logs-%{+YYYY.MM.dd} for the index of all config files to have them indexed into the same folder?
input {
file {
part => /home/ubuntu/logs/data.log
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index =>"logs-%{+YYYY.MM.dd}"
}
}

What you do is ok and it will work. Just one thing I would correct is that you should simply write to a data stream and not have to care about the index name and ILM matters (rollover, retention, etc), like this:
input {
file {
part => /home/ubuntu/logs/data.log
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "ubuntu"
data_stream_namespace => "prod"
}
}
The data stream name will be logs-ubuntu-prod, you can change the latter two to your liking.
Make sure to properly set up your data stream first, with an adequate Index Lifecycle Management policy, though.
On a different note, it's a waste of resource to install Logstash on all your remote machines which is supposed to work as centralized streaming engine. You should definitely either use Filebeat, or even better now the Elastic Agent which is fully manageable through Fleet in Kibana. You should have a look.

Related

Best pratices for for ilm and dynamic indices in logstash elasticsearch output plugin

I am struggeling to find a suitable way to handle ilm rollover on dynamic indices from a logstash pipeline to elasticsearch.
The pipeline looks like that:
input {
pipeline {
address => "some_pipeline_name"
}
}
output {
elasticsearch {
hosts => [ "serv1", "server2" ]
index => "%{[label1]}-%{[label2]}-%{[label3]}"
user => "logstash_user_to_send_elasticsearch"
password => "password1337"
}
}
If I have an index template for the index that was dynamically created everythings works well and there are no problems with ilm and rollover.
But since I don't know all values the 3 labels can get, I don't know how to handle the ilm aspect.
It is not possible to define ilm_rollover_alias dynamically and I am not able to create pipelines for every possible value of label* in advance.
How do you guys handle this problem?
Regards
Sebastian

How to parse data from S3 using Logstash and push to Elastic Search and then to Kibana

I have a log file created in S3 bucket every minute.
The data is "\x01" delimited. One of the column is a timestamp field.
I want to load this data into elastic search.
I tried using the following logstash conf. But it doesn't seem to work. I don't see any output. I took some reference from http://brewhouse.io/blog/2014/11/04/big-data-with-elk-stack.html
Logstash config file is as follows:
input {
s3 {
bucket => "mybucketname"
credentials => [ "accesskey", "secretkey" ]
}
}
filter {
csv {
columns => [ "col1", "col2", "#timestamp" ]
separator => "\x01"
}
}
output {
stdout { }
}
How do I modify this file to take in new file coming in every minute?
I would then eventually want to connect Kibana to ES to visualize the changes.
Just use logstash-forwarder to send the files from S3, you will have to generate certificates for authorization.
There is a really nice tutorial: https://www.digitalocean.com/community/tutorials/how-to-use-logstash-and-kibana-to-centralize-logs-on-centos-7
if you getting I/O errors, mb you can solve them by setting cluster:
inside logstash.conf:
output {
elasticsearch {
host => "127.0.0.1"
cluster => CLUSTER_NAME
}
inside elasticsearch.yml:
cluster.name: CLUSTER_NAME
if you getting problems generating certificates, you can generate them using this:
https://raw.githubusercontent.com/driskell/log-courier/develop/src/lc-tlscert/lc-tlscert.go
I also found better init.d for logstash-forwarder on CentOS:
http://smuth.me/posts/centos-6-logstash-forwarder-init-script.html

Copy ElasticSearch-Index with Logstash

I have an ready-build Apache-Index on one machine, that I would like to clone to another machine using logstash. Fairly easy i thought
input {
elasticsearch {
host => "xxx.xxx.xxx.xxx"
index => "logs"
}
}
filter {
}
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
}
}
that pulls over the docs, but doesn't stop as it uses the default query "*" (the original index has ~50.000 docs and I killed the former script, when the new index was over 600.000 docs and rising)
Next I tried to make sure the docs would get updated instead of duplicated, but this commit hasn't made it yet, so i don't have a primary..
Then I remembered sincedb but don't seem to be able to use that in the query (or is that possible)
Any advice? Maybe a complete different approach? Thanks a lot!
Assuming that the elasticsearch input creates a logstash event with the document id ( I assume it will be _id or something similar), try setting the elastic search output the following way:
output {
elasticsearch {
cluster => "Loa"
host => "127.0.0.1"
protocol => http
index => "logs"
index_type => "apache_access"
document_id => "%{_id}"
}
}
That way, even if the elasticsearch input, for whatever reason, continues to push the same documents indefinitely, elasticsearch will merely updated the existing documents, instead of creating new documents with new ids.
Once you reach 50,000, you can stop.

Way to populate Logstash output variable without getting it from an Input?

Is there another way to tell Logstash to supply a value to an output variable without pulling it from a Logstash input? For example, in my case I'd like to create an Elasticsearch index based on a performance run ID (which I'd do from an external script) and then have Logstash send to that. For now I was thinking of creating a tcp input just for receiving perf run info and then have a filter to match on the run id. Seems like a convoluted way to do this though. For example:
input {
tcp {
type => "perfinfo"
port => 8888
}
}
if [type] == "perfinfo" {
do some matching to extract the id
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure if setting manage_template to false would actually be necessary. I've read that it is.
Update
Thanks Nirdesh for that. Using Ruby might be very handy.
While I was waiting I tried using a grok filter like so:
grok {
match => { "message" => "%{WORD:perftype}-%{POSINT:perfid}" }
}
Which produced this stdout during debugging:
{
"message" => "awperf-14",
"#version" => "1",
"#timestamp" => "2014-10-17T20:01:19.758Z",
"host" => "0:0:0:0:0:0:0:1:33361",
"type" => "perfinfo",
"perftype" => "awperf",
"perfid" => "14"
}
Which I tried creating an index based on this like so:
index => "%{perftype}-%{perfid}"
So when I passed 'awperf-14' to the input, I ended up creating these indexes
%{perftype}-%{perfid}
awperf-14
Which is not what I was expecting. Also, it's the %{perftype}-%{perfid} index that starts to be populated, not awperf-14, the one I actually wanted.
Yes.
You can add any no. of your own variables either for intermediate result or for permanent using a property called add_field. All most all filters in logstash support this property.
So, for your soluation, you can use a ruby script to find out the id dynamically and store it in a new variable called id, which you can use it in output.
For Example :
input {
tcp {
type => "perfinfo"
port => 8888
}
}
filter{
if [type] == "perfinfo" {
ruby{
//do some processing
add_field => { "id" => "Some value" }
}
}
}
output {
elasticsearch {
cluster => "mycluster"
manage_template => false
index => "%{id}-perftest"
}
}
I'm not sure I can do what I was trying to do via Logstash. To be a clearer, I simply wanted to change the index based on the performance run ID I'm executing. There's nothing in the data that would have this information (I have to pull it from a DB). So instead of trying to have Logstash listen for a performance run ID, I scripted this externally. The script uses the Elasticsearch API to create a new index, and then does a string replace for the index in the Logstash config file. It then restarts Logstash, which normally happens between performance runs anyway. This approach was much easier to do, and seems cleaner.

data from rabbitmq not being read into kibana dashboard

I just altered my logstash-elasticearch setup to include rabbitmq rather since I wasn't able to get messages into logstash fast enough with tcp connection. Now it is blazing fast as logstash reads from the queue but I do not see the messages coming through into kibana. One error shows the timestamp field missing. I used the plugin/head to view the data and it is odd:
_index _type _id ▼_score #version #timestamp
pt-index logs Bv4Kp7tbSuy8YyNi7NEEdg 1 1 2014-03-27T12:37:29.641Z
this is what my conf file looks like now and below what it did look like:
input {
rabbitmq {
queue => "logstash_queueII"
host => "xxx.xxx.x.xxx"
exchange => "logstash.dataII"
vhost => "/myhost"
}
}
output {
elasticsearch{
host => "xxx.xxx.xx.xxx"
index => "pt-index"
codec => "json_lines"
}
}
this is what it was before rabbitmq:
input {
tcp {
codec => "json_lines"
port => "1516"
}
}
output {
elasticsearch {
embedded => "true"
}
}
Now the only change I made was to create a specific index in elasticsearch and have the data indexed there but now it seems the format of the message has changed. It is still json messages with 2/3 fields but not sure what logstash is reading or changing from rabbitmq. I can see data flowing into the histogram but the fields are gone.
"2014-03-18T14:32:02" "2014-03-18T14:36:24" "166" "google"
these are the fields I would expect. Like I said all this worked before I made the change.
I have seen examples of a similar configurations, but they do not use the output codec of "json_lines" going into Elasticsearch. The output codec would adjust the formatting of the data as it leaves logstash which I do not believe is nessisary. Try deleting the codec and see what logstash is outputting by adding a file output to a log, be sure this is only short sample...

Resources