NiFi use relative path for get and put file processor - apache-nifi

New to NiFi, struggling with some basics and would be grateful for any assistance...
I need to get files from a remote server and store them on a NiFi server. Using GetSFTP and PutFile this works ok. But I wish to keep the relative paths.
ie. on remote server /data/hosts/host01/.... would be copied into /imports/host01/... on NiFi
I have this working by using /imports/$path:substring(11)} as the Directory value in PutFile. I have tried using a parameter and a variable to store the Remote Path value and tried using the :length(var1) function of the Parameter and/or Variable within the the above substring function but nothing seems to work other than hard coding the substring length.
Is their an easier way to keep relative paths with get and put processors??

GetFile and PutFile require absolute paths for the directories. If you know the relationship between the root directory and the destinations, you can construct the full path using Expression Language.

Related

File Path in Variable -> File identifier in File Content for Power Automate Flow

So I have this flow step and I want to replace hardcoded value of file identifier
with variable.
(Point where I call variable)
(How variable looks like)
Hardcoded path works, I tried it.
I want to set hardcoded path into variable and use it in steps like variable.
There are two operations for retrieving file contents, you need to use the right one when you want to use.
I created a basic variable with a file name ...
... and then tested with both types of Get file content operations in the SharePoint group.
You need to use the Get file content using path operation.

NiFi - How to get listing of directories and organize them by name then obtain files from each directory?

I'm trying to figure out how to perform the following steps within NiFi.
Obtain listing of directories from a specific location e.g. /my_src (Note the folders that will be appearing within here will be dated e.g. 20211125)
Based off of the listing obtained I need to sort the folders by date
For each folder then I need to GetFile from that directory
Then sort those files by their names
I am stuck at step 1 on finding a processor that pulls the directory names. I only see GetFile and List file.
Reason for this is that I need to process the folders based on the oldest to newest.
I would expect to be using a regex pattern to locate the valid folders that match the date format and ignore the other folders. Then with those values found pass them along sorted to another process that would get files from that path location, which GetFile does not seem to allow me to set dynamically.
Am I to approach this process differently within NiFi?

caffeine simulator: could not find file: WebSearch1.spc.bz2

I'm simulating in Caffeine's simulator a sequence of a several traces, of different formats.
However, when trying to run Umass storage traces I get errors, e.g.:
Could not find file: WebSearch2.spc.bz2
I guess that the problem dwells in some combination of the format, path, and filename.
E.g., when writing in the .conf file:
paths = ["lirs:loop.trace.gz"]
the format is "lirs", and indeed there's a file
\simulator\src\main\resources\com\github\benmanes\caffeine\cache\simulator\parser\lirs\loop.trace.gz
so this works fine.
Similarly, I created under \parser a sub-directory named "umass-storage", and downloaded there the file WebSearch2.spc.bz2, and then wrote in the .conf file:
paths = ["umass-storage:WebSearch2.spc.bz2"]
I tried also unzipping the file, and then use paths = ["umass-storage:WebSearch2.spc"]
as well as a few other combinations, but all of them give the error above.
To discover the trace files automatically they have to be placed in the same package as its trace reader. In this case it would be ../parser/umass/storage. However, since it is a large file you might not want to include it in you repository. Instead, you can specify the absolute path and keep the files in an external directory.
OK, thanks to Ben I solved it, and got the tiny trick here.
For most traces it's enough to write merely the format name (which is also the directory name), e.g.:
paths = ["lirs:loop.trace.gz"]
However, umass traces include 2 sub-cases (storage / network). Hence it works (at least for me) only when stating the file's full path, e.g.,
paths = ["umass-storage:/Users/ben/Documents/traces/umass/WebSearch2.spc.bz2"

NiFi: Routing on File Types, e.g. csv, tsv, xlsx

I have a connected SFTP server, and I am trying to route files based on type: .csv, .tsv, and .xlsx. For now, I'm just uploading test files through the command line.
My flow is:
GetSFTP (with correct hostname, etc.) ->
RouteOnAttribute ->
LogAttribute (will dump elsewhere soon, this is just for testing)
My problem, I think, is that I created a property in RouteOnAttribute incorrectly:
Am I correct in assuming that this does not actually pick up on the .csv because it is not technically part of the filename? What would be the correct expression to route on the file type? Thanks!
You need some information that will tell you the type of file.
GetSFTP should be getting the filename from the file on the sftp server, so if those have the appropriate extensions then I would expect your RouteOnAttribute to work correctly.
If the filename does not have the appropriate extension, then the only thing you can do is try to use IdentifyMimeType to determine what type of file it is, and then route on the mime.type attribute.

How do I know which include path will be used in PHP?

When I run phpinfo() and look by the Configuration category under PHP Core, I see a directive titled include_path, with a local value and a master value.
In this case, my local value is set to
.:
./include:
../include:
/usr/share/php:
/usr/share/php/smarty:
/usr/share/pear
and my master value is set to
.:
/usr/share/php:
/usr/share/pear:
/usr/share/php/pear:
/usr/share/php/smarty
The reason I am trying to learn how this works is because there is a file in the system I am working on titled Smarty.class.php, which I'm sure sounds very familiar to anyone who uses Smarty Templating Engine.
One of the PHP files has the following includes:
require_once("Smarty.class.php");
require_once("user_info_class.inc");
The file user_info_class.inc is in the same directory as the file making the include, which makes perfect sense to me, and is the way that I've always referenced files. I decided that I wanted to open up the Smarty.class.php file and had assumed it would be in the same directory, but it was not.
After doing a bit of digging, I discovered those php_ini variables, and was finally able to locate the file in the directory usr/share/php/smarty/.
So it would seem that when making an include, it follows some sort of order between the Local and Master values for the include_path.
Assuming that my deductions were correct thus far, can someone explain the order in which PHP searches for the files to be included?
The global value is basically what's set in php.ini. The local value is what's currently being used. The local value completely overwrites the master value.
According to the manual, PHP checks the paths in the order that they are specified in the include_path setting: http://php.net/manual/en/ini.core.php#ini.include-path

Resources