Unable to re-process the log file using logstash version 2.3.2 - elasticsearch

I have processed a file using logstash and pushed it to elasticsearch it work. However, I had to make some changes to the logstash conf file and need to process the log file again. I deleted the index on es and restarted the logstash. I dont see the data in elasticsearch, it looks like the file is not being processed.
1. I am using logstash version 2.3.2
2. I deleted _sincedb file, restarted logstash, no log
3. I checked the conf file syntax via --configcheck and it is ok.
Any ideas what I am missing here?
I dont see any index created, no data in es. I tried these steps multiple times.

Logstash is smart enough to remember until which line it has already processed each file you've given him and stores that cursor in a sincedb file.
So, in addition to the path setting, you need to specify two more parameters in your file input that will make sure that the file is re-processed on each run:
file {
path => "/path/to/file"
start_position => "beginning"
sincedb_path => "/dev/null"
}

Related

How to receive logs from multiple filebeats running on different machine(ec2) to one logstash instance?

i have 2 filebeats running on different machine(Ex: A & B) and filebeat is also configured to send logstash machine but logstash is not receiving the data even though I have added the private IP of those machines.
so i have 2 logstash .conf file to receive from 2 different filebeats.
My conf file looks like this ->
A.conf
input{
beat{
port:""
host:"private-ip-m1"
}
}
B.conf
input{
beat{
port:""
host:"private-ip-m2"
}
}
after I run logstash it is unable to connect. it says Error: Cannot assign requested address.
Can anyone tell me is there any other way to this ?

Filebeat not picking up some files

I am trying to send tomcat logs to ELK. I am using Filebeat to scan the files.
My log file name would be "project_err.DD-MM-YYYY". In filebeat configuration, I am giving the file name as foldername\project_err*
But filebeat is ignoring the files. Is this configuration correct?

Why is dataproc outputting an unexpected value?

I have created a jar file that uses hadoop to counts the number of bigrams found in a set of text files.
When I run a hadoop job on my local setup I receive an output fie containing a count of bigrams in the text file.
Correct output
however, when I use the exact same jar file but using dataproc on Google cloud platform. It outputs the following
dataproc, incorrect output
Any ideas why this may be happening? Cheers

How to stop logstash to write logstash logs to syslog?

I have my logstash configuration in my ubuntu server which reads data from the postgres database and send the data to elastic search. I have configured a schedule at each 15 minutes the logstash will look the postgres table, if there is any change in the table it sends the data to elastic search.
But each time the logstash is also sending the logs to syslog which I does not need. Because of logstash my syslog file consumes more memory.
So how to stop logstash to send its logs to syslog. Is there is any configuration in logstash.yml to avoid sending logs to syslog.
I referred many sites in online in which they said to remove below line from the configuration.
stdout { codec => rubydebug }
But I don't have this line.
In my output I just send my data to elastic search which I brought from AWS.
Is there is a way to stop logstash to sending its logs to syslog?
disable the rootLogger.appendRef.console in log4j
The logfiles that logstash itself produces are created through log4j, one stream goes by default to the console. Syslog will write to consolelogs to the syslog file itself. In the Ubuntu version of logstash this is configured in the file name/etc/logstash/log4j2.properties
In the default configuration there is a line that starts with
rootLogger.appenderRef.console
If you add a # in front of the line and restart logstash. The logfiles that logstash creates will stop going to syslog
service logstash restart
The other rootLogger that uses the RollingFileAppender should still write logmessages from logstash itself (so not the messages that are being processed by your pipeline) to
/var/log/logstash/logstash-plain.log
It's easy to confuse the logfiles that logstash creates with the messages that you process, especially if they get mixed by the logstash-output-stdout or logstash-output-syslog plugins. This is not applicable to you because you use the logstash-output-elasticsearch plugin that writes to elasticsearch.
The log4j.properties file gets skipped if you run logstash from the commandline, in Ubuntu. It's a nice way of testing your pipeline in a terminal, you can run multiple logstash instances in parallel (e.g. the service and a commandline test pipeline)
/usr/share/logstash/bin/logstash -f your_pipeline.conf
To avoid write to syslog, check your pipelines and log4j.properties files.
In your pipelines files, remove all occurences of this :
stdout { codec => rubydebug }
And in your log4j.properties files comment this line :
rootLogger.appenderRef.console

Can any nifi processor catch hdfs directory changes?

is there any way i can manage adding deleting and updating flowfiles in my hdfs directory after i delete or update them in my second hdfs directory, i mean i want the same flowfile in directory 1 to change or be deleted aproprietly when the flowfile with the same name is changed in second directory?
i tried to use listHdfs,fetchhdfs and puthdfs for file adding flowfile to second directory but i can't manage update and delete operations,
what can i do ?
Should i use hadoop tools for it or it is possible to make with nifi?

Resources