Logstash file input plugin returning duplicate events - elasticsearch

So i am using file plugin in logstash to input logs from multiple files.
Path => “/path/to/a*.txt”
I have two files: a1.txt and a2.txt.
Now i start logstash, both files’ data gets sent to stdout. But when i make a new entry in any of the file, it sends that new line, but also sends the second last line again.
I’ve set the start_position to “beginning”.
Any idea what is going on?

I resolved it actually. So the thing is if you open the file and modify it, it’s inode number changes! Then this new inode number also get’s registered as an entry in sincedb file. Hence the duplicates.

Related

Logstash to load input file based on time change

I'm using Logstash 7.17.0, in that i'm trying to load file using pipeline.
It is taking file based on size or checksum changes, but i wanted to pick the file even if the size same but change in file timings.
For example: i'm getting data everyday, in some cases i might get the same data like yesterday. But logstash is not picking the file if the size is same.
Logstash tracks the current position in watched files in a dedicated file named sincedb (docs). If you disable this feature, Logstash will always read the entire file from the beginning. To do so, you can set to /dev/null the sincedb_path property of the file input plugin, e.g.,
input {
file {
path => "/path/to/your/file"
sincedb_path => "/dev/null"
}
}

Process latest file from GetS/List SFTP processor

I am getting multiple files from List SFTP processor. However the requirement is to only process the latest file based on last modification time of file. I tried merging files via merge content processor , but the last modification time goes away. Current version of Nifi is 1.6, so record set writer can't be used. How can the solution for it be implemented.
You can use AttributesTo*Processor and create a new flow file from filename and file.lastModifiedTime attributes. Then you can merge content to get a single flow file with both filename and modifiedtime. You should be able to able to get the file from here.

Using Logstash on tarfile to create Elasticsearch pipelines

I periodically receive gzipped tarfiles containing different types of logs I want to load into Elasticsearch. Is Logstash suitable for this use case? The issue I seem to be running into is that even if I can extract the tarfile contents, Logstash requires me to specify absolute file paths whereas my file paths will differ for each tarfile I want to load.
The file input plugin for Logstash is usually used for "active" log files to read and index data in real time.
If the logfiles that you are going to be processing are complete, you don't need to use the file input plugin at all, it's enough to use the stdin input plugin and pass the contents of the file to the Logstash process.

How to overwrite a file in a tarball

I've got an edge case where two files have the same name but different contents and are written to the same tarball. This causes there to be two entries in the tarball. I'm wondering if there's anything I can do to make the tar overwrite the file if it already exists in the tarball as opposed to creating another file with the same name.
No way as the first file have already been written when you ask to write the second one and the stream has advanced the position. Remember tar files are sequentially accessed.
You should do deduplication before starting to write.

How do you use logrotate with output redirect?

I'm currently running a ruby script which logs its HTTP traffic to stdout. Since I wanted the logs to be persistent, I redirected the output to a log file with ruby ruby_script.rb >> /var/log/ruby_script.log. However, the logs are now getting very large so I wanted to implement logrotate using the following:
"/var/log/ruby_script.log" {
missingok
daily
rotate 10
dateext
}
However, after running logrotate --force -v ruby_script where "ruby_script" is the name of the logrotate.d configuration file, no new file is created for the script to write to, and it writes to the rotated file instead. I'm guessing this behavior happens because the file descriptor that is passed by >> sticks to the file regardless of moving it, and is unrelated to the filename after the first call. Thus, my question is, what is the correct way to achieve the functionality I'm looking for?
Take a look at option copytruncate.
From man logrotate:
copytruncate: Truncate the original log file to zero size in place after creating a copy, instead of moving the old log file and
optionally creating a new one. It can be used when some program cannot be told to close its logfile and thus might
continue writing (appending) to the previous log file forever. Note that there is a very small time slice between
copying the file and truncating it, so some logging data might be lost. When this option is used, the create option
will have no effect, as the old log file stays in place.

Resources