I have a fairly simple process that merges xml files into one or more XML files, using the MergeRecord processor. I'm then converting them into JSON and writing them out with PutFile. The files come out with fabulous names like 79f000ec-9da1-4b59-a0a8-79cc3bb5e85a.
Is there any way to control those file names, or at least give them an appropriate extension?
beforeyour putFile use updateAttribut processor and rename ${fileName}
Exemple :
Related
I was trying to unzip a file (which contain 2 file format, one PDF and another txt). I was able to unzip it, now needs to separate it and keep it in 2 separate queues, one will hold txt and another will do PDF. If anyone can assist me how to approach this route.
you need RouteOnAttribute processor.
add 2 properties into it:
match_txt = ${filename:toLower():endsWith('.txt')}
match_pdf = ${filename:toLower():endsWith('.pdf')}
as result this processor will have 2 output relations: match_txt and match_pdf
I'm using NiFi 1.11.4 to read CSV files from an SFTP, do a few transformations and then drop them off on GCS. Some of the files contain no content, only a header line. During my transformations I convert the files to the AVRO format, but when converting back to CSV no file output is produced for the files where the content is empty.
I have the following settings for the Processor:
And for the Controller:
I did find the following topic: How to use ConvertRecord and CSVRecordSetWriter to output header (with no data) in Apache NiFi? but in the comments it mentions explicitly that ConvertRecord should cover this since 1.8. Sadly I understood it incorrectly, it does not seem to work or my setup is wrong.
While I could make it work with by explicitly writing the schema as a line to empty files, I wanted to know if there is also a more elegant way?
I have a connected SFTP server, and I am trying to route files based on type: .csv, .tsv, and .xlsx. For now, I'm just uploading test files through the command line.
My flow is:
GetSFTP (with correct hostname, etc.) ->
RouteOnAttribute ->
LogAttribute (will dump elsewhere soon, this is just for testing)
My problem, I think, is that I created a property in RouteOnAttribute incorrectly:
Am I correct in assuming that this does not actually pick up on the .csv because it is not technically part of the filename? What would be the correct expression to route on the file type? Thanks!
You need some information that will tell you the type of file.
GetSFTP should be getting the filename from the file on the sftp server, so if those have the appropriate extensions then I would expect your RouteOnAttribute to work correctly.
If the filename does not have the appropriate extension, then the only thing you can do is try to use IdentifyMimeType to determine what type of file it is, and then route on the mime.type attribute.
Below is a simple NiFi flow which monitors a folder for file and copies to a different folder. It works fine, but I'm looking for a processor which extracts only the filename and writes the name of the file in a text-file
I tried ExtractText processor but could not figure how to configure it to read only filename. Any advise is highly appreciated.
If I understand your use case correctly, you should be able to use ListFile -> ReplaceText -> UpdateAttribute -> PutFile.
ListFile will generate a flow file for each file it finds in the directory, but the flow file will not have any content, it will just put the filename in an attribute. Then you can use ReplaceText to replace the entire text (i.e. flow file contents) with ${filename}. UpdateAttribute would be used to change the filename attribute to whatever you want the destination text file to be called, for use in PutFile.
In my code there are two types of files with extension .csv or .psv and .tigger files. .csv files have more size than .trigger files, so .trigger files are getting transfer in prior to .csv files.
How to make sure that once .csv files are transferred only .trigger files should be transferred.
Am using same single route to transfer both the files.
You can use the sortBy-option of the camel file component. See http://camel.apache.org/file2.html for more information.
One idea is to implement camel's org.apache.camel.component.file.GenericFileFilter and write your filter logic in accept method. Logic should pick all the csv files first and then the trigger files. Use filter option of file component, from end point will be like:
from("file://inbox?filter=myFilter")