Logstash to load input file based on time change - elasticsearch

I'm using Logstash 7.17.0, in that i'm trying to load file using pipeline.
It is taking file based on size or checksum changes, but i wanted to pick the file even if the size same but change in file timings.
For example: i'm getting data everyday, in some cases i might get the same data like yesterday. But logstash is not picking the file if the size is same.

Logstash tracks the current position in watched files in a dedicated file named sincedb (docs). If you disable this feature, Logstash will always read the entire file from the beginning. To do so, you can set to /dev/null the sincedb_path property of the file input plugin, e.g.,
input {
file {
path => "/path/to/your/file"
sincedb_path => "/dev/null"
}
}

Related

Process latest file from GetS/List SFTP processor

I am getting multiple files from List SFTP processor. However the requirement is to only process the latest file based on last modification time of file. I tried merging files via merge content processor , but the last modification time goes away. Current version of Nifi is 1.6, so record set writer can't be used. How can the solution for it be implemented.
You can use AttributesTo*Processor and create a new flow file from filename and file.lastModifiedTime attributes. Then you can merge content to get a single flow file with both filename and modifiedtime. You should be able to able to get the file from here.

Using Logstash on tarfile to create Elasticsearch pipelines

I periodically receive gzipped tarfiles containing different types of logs I want to load into Elasticsearch. Is Logstash suitable for this use case? The issue I seem to be running into is that even if I can extract the tarfile contents, Logstash requires me to specify absolute file paths whereas my file paths will differ for each tarfile I want to load.
The file input plugin for Logstash is usually used for "active" log files to read and index data in real time.
If the logfiles that you are going to be processing are complete, you don't need to use the file input plugin at all, it's enough to use the stdin input plugin and pass the contents of the file to the Logstash process.

Logstash file input plugin returning duplicate events

So i am using file plugin in logstash to input logs from multiple files.
Path => “/path/to/a*.txt”
I have two files: a1.txt and a2.txt.
Now i start logstash, both files’ data gets sent to stdout. But when i make a new entry in any of the file, it sends that new line, but also sends the second last line again.
I’ve set the start_position to “beginning”.
Any idea what is going on?
I resolved it actually. So the thing is if you open the file and modify it, it’s inode number changes! Then this new inode number also get’s registered as an entry in sincedb file. Hence the duplicates.

Error while adding TimeLine to file in Apache Nifi

I am using HDP 2.5. I try to add time for file which is locate in HDFS file. For that I use GetHDFS->UpdateAttribute->PutHDFS.
First I get file from HDFS through GetHDFS processor and then I change format of file in UpdateAttribute by adding property "
${filename}.${now():format("yyyy-MM-dd-HH:mm:ss.SSS'z'")}". Finally I put file in HDFS. In this stage I have one issue for example If destination folder(in HDFS) contain file which already have time line. Once I run flow in result two or more time line is present for same file
File which contain already timeline
After flow of Nifi File contain two timeline
Can anyone tell me how to resolve this issue
If you don't want to change your current workflow, the best option is probably to use the "File filter" property in the GetHDFS processor to only get files not containing the date in the filename (assuming your files have some naming convention). Another option is to send the renamed files in another directory.
As a general comment, I'd recommend using the combination of ListHDFS and FetchHDFS processors as it is a more efficient pattern when working with a NiFi cluster. You could then use a RouteOnAttribute in the middle to do some more advanced filtering than the "File filter" option.
Another comment: your approach is not the most performant one as you are downloading the data from HDFS, and then uploading it back. A rename/move operation in HDFS would probably be cleaner (or having a correct naming in the first place). You could use WebHDFS interface to perform the renaming using InvokeHTTP processor in NiFi in combination with ListHDFS processor.
You can use Expression Langage to delete the previous timestamp and then add the current timestamp. You have several string functions such as substringBefore or substringAfter that you can use depending on the logic of your file names.
enter link description here

How to process files with same name in Apache NiFi?

I'm learning NiFi and I'm working on a flow where I get files using GetFile and then I do some process and then store them into HDFS using PutHDFS processor. The thing is, most probably I'll get files with the same name. For ex, I might get a file every 30 minutes and the file that is generated every 30 minutes will have the same name.
Now when I put that file into HDFS, I get an "File with the same name already exists". How do I overcome this? Is there any way to change the file name on the run?
It is a very easy one. I just have to use UpdateAttribute processor to change the file name. For ex: you can append timestamp to the file name.
In the UpdateProcessor, add a property filename and its value ${filename}.${now()}

Resources