I have a Nifi flow where I am fetching files from S3. A pair of files are fetched through S3 and later passed into a MergeContent processor. Next, there is a README file that needs to go with each pair of files.
This README file is always same and I have stored it locally. I have a ExecuteStreamCommand that takes in content from the MergeContent processor.
I have tried passing in the README file into the MergeContent processor using the ListFile/FetchFile combination but its not working as expected. I guess the final result that I am looking for is a MergeContent package that contains a pair of files downloaded from S3 + the README file.
I think in this case you will want to use GetFile for the README -- the path is static, and you can set the Keep Source File setting to true in order to constantly retrieve the same content.
ListFile/FetchFile probably isn't working because once ListFile retrieves a filename from the directory, it stores the timestamp in its local state and won't retrieve files older than that on the next execution.
Related
Will be having many files daily, but need to pull them only if particular text file is in the list (which indicates all files are ready to pull), through GetSFTP Processor.
This process involved pulling files from SFTP and copying to aws-s3.
I know an alternate process to write a script and pull them through the script but I am looking to achieve the same with processors without a script.
I've set up the pipeline and it works (I followed this documentation https://learn.microsoft.com/en-us/azure/connectors/connectors-create-api-ftp), it downloads the zip file and loads it to the blob storage.
however the resulted zip file is corrupted. it has a slightly different size than the original file.
I set the infer content type to YES. Also tried this setting to no but didn't change result.
I tried with hardcoded and dynamic naming.
Hi I'm using nifi as an ETL tool.
Process IMG
This is my current process. I use TailFile to detect CSV file and then send messages to Kafka.
It works fine so far, but i want to delete CSV file after i send contents of csv to Kafka.
Is there any way?
Thanks
This depends on why you are using TailFile. From the docs,
"Tails" a file, or a list of files, ingesting data from the file as it is written to the file
TailFile is used to get new lines that are added to the same file, as they are written. If you need to a tail a file that is being written to, what condition determines it is no longer being written to?
However, if you are just consuming complete files from the local file system, then you could use GetFile which gives the option to delete the file after it is consumed.
From a remote file system, you could use ListSFTP and FetchSFTP which has a Completion Strategy to move or delete.
My use case.
Some processing somewhere else add files to some dir (_use_it) -> call my flow using REST -> Now I want my process to read all files from mentioned directory (_use_it).
I want to read all files everytime from this directory, not just changed/added files. I can't start/stop process. This flow has to run as a background process.
I think, I am looking for ListFile processor to run once, then stop, and then when It runs again, it forgets previous state. "some twisted logic" :)
Thanks
1. Using GetFile Processor:
You can use GetFile processor instead of ListFile + FetchFile processors and GetFile processor doesn't store the state.
GetFile processor Gets all the files in the directory every time.
Keep Source File property If true, the file is not deleted after it
has been copied to the Content Repository; this causes the file to be
picked up continually and is useful for testing purposes. If not
keeping original NiFi will need write permissions on the directory it
is pulling from otherwise it will ignore the file.
(or)
2. Using ListFile Processor:
Making use of NiFi RestAPI we can clear the state of list file processor and then processor will list out all files in the directory every time.
Clear state of the processor:
POST
/processors/{id}/state/clear-requests
Before you are starting the Listing all files in the directory flow
Use Rest Api to stop the ListFile processor
Clear the state of ListFile processor
Start the ListFile processor.
Refer to this and this links to STOP the processor via RestApi
I want to make the file to zip . I can make the file to zip file but I don't want to create temporary zip file. Is any way to make the file to zip and to byte[] ?
Thanks
zip files are typically generated with streams anyway, so there's no need to temporarily store them in a file - might as well be in memory or streamed directly to a remote recipient (with only a small memory buffer to avoid a large memory footprint).
Ref Sample helperclass in the accepted answer section How can I generate zip file without saving to the disk with Java?