check the copying in directory in shell - shell

I have created a script which pick the files from a directory inbox. I have to include a handling in the script whether files are coming from another process or not. if files are coming through another process then my script should wait until files are copied.
For this i have create a flag
CHECK_COPY_PROCESS=$(ps -ef|grep -E 'cp|mv|scp'|grep inbox)
if flag CHECK_COPY_PROCESS contains some value then process will go to waiting state. but problem is that suppose some files are coming from sub directory of that directory inbox
then process will generate like this cp file_name .. above logic is not working.

You could use the tip of Basile, using lsof in conjunction with awk (grepping only on the first column, which is the command name)
Example:
lsof +D /path/to/inbox|awk '$1~/mv|cp|scp/'
This is not tested as I currently don't have big files that take a while to copy on my machine.

Related

InotifyWait changing file name to .outputstream-xxxx

I have a bash shell script using inotifywait to capture files added to a specific Dropbox folder, running a process and saving the file to another Dropbox folder. I'm running PopOS 22.04.
For some reason, inotifywait is now saying $filename is .goutputfile-xxxx and not the actual file name. In earlier testing of this process, it did not do this.
I have found a forum page about commenting out set locking in nanorc config file, but this did nothing.
Below is the opening code to the bash shell script:
#!/bin/bash
#set time and date variables
TIMENOW="$(date +"%T")"
DATENOW="$(date +"%m-%d-%Y")"
#launch the inotifywait utility formatting output to just the file name and piping it into while function to process
#file is saved to Dropbox folder from the first part of a Zapier process
inotifywait -m -e create --format %f /home/dave/Dropbox/Inbound |
while read -r filename event; do
Please let me know what config file I need to edit or code I need to use to force inotifywait to use the actual file name and not the .goutputstream-xxxx name.
I tried editing the nanorc config file by commenting out set locking option per another forum post. This did nothing.
I need some way of capturing the actual file name, even if it is in a separate variable.

Counting number of files in a directory with Nifi

Using Apache Nifi, I am passing files to a directory. I want to count the number of files in this directory, wait until all of the files I need are present, and then run the StreamExecuteCommand processor to process the data in that directory. (Right now, the StreamExecute doesn't wait long enough for all of the files to arrive before the process begins - so I want to add this wait)
I just want to know how to count the number of files in a directory to start. I am using ListFiles to retrieve the names of files, but not sure how to count them in NiFi.
Thanks
If you are using ExecuteStreamCommand to run a shell command on the files, you could easily add something like ls -l | wc -l to the same or additional ExecuteStreamCommand processor to count the number of files in the directory.
We usually caution against this approach, however, because there are edge cases where you can have a file present in the directory which isn't "complete" if some external process is writing it. The model usually recommended is to write files in with a temporary filename like .file1, .file2, and rename each upon successful completion to file1, file2, etc. The ListFile processor supports numerous settings to avoid detecting these files until they are ready for processing.
We also usually recommend setting some boolean flag through the external process rather than waiting for an explicit count unless that value will never change.

Create a bash script that runs and updates a log whenever a file is deleted

I am new to bash scripting and I have to create a script that will run on all computers within my group at work (so it's not just checking one computer). We have a spreadsheet that keeps certain file information, and I am working to automate the updating of that spreadsheet. I already have an existing python script that gathers the information needed and writes to the spreadsheet.
What I need is a bash script (cron job, maybe?) that is activated anytime a user deletes a file that matches a certain extension within the specified file path. The script should hold on to the file name before it is completely deleted. I don't need any other information besides the name.
Does anyone have any suggestions for where I should begin with this? I've searched a bit but not found anything useful yet.
It would be something like:
for folders and files in path:
if file ends in .txt and is being deleted:
save file name
To save the name of every file .txt deleted in some directory path or any of its subdirectories, run:
inotifywait -m -e delete --format "%w%f" -r "path" 2>stderr.log | grep '\.txt$' >>logfile
Explanation:
-m tells inotifywait to keep running. The default is to exit after the first event
-e delete tells inotifywait to only report on file delete events.
--format "%w%f" tells inotifywait to print only the name of the deleted file
path is the target directory to watch.
-r tells inotifywait to monitor subdirectories of path recursively.
2>stderr.log tells the shell to save stderr output to a file named stderr.log. As long as things are working properly, you may ignore this file.
>>logfile tells the shell to redirect all output to the file logfile. If you leave this part off, output will be directed to stdout and you can watch in real time as files are deleted.
grep '\.txt$' limits the output to files with .txt extensions.
Mac OSX
Similar programs are available for OSX. See "Is there a command like “watch” or “inotifywait” on the Mac?".

Can the shell direct where a program places its output files?

Can the shell override where output files are placed? (Not the console/screen output, but files created by a program.) I have a script that currently runs a sequence of input files through a program and for each one produces a lot of different output files.
for i in `seq 1 24`
do
../Bin/myprog inputfile.$i.in
done
Is there a way to create new directories for each run of the program and place the corresponding output files in each directory? So I would get dir1: <output files from run 1>; dir2 <output files from run 2> etc. I suppose one way would be to just write another script to create directories and sort all the files after the program(s) had run, but is there a more elegant way to do it?
As suggested in the comments, this might be what you need, assuming that your program just dumps output into the current working directory.
for i in `seq 1 24`
do
mkdir $i
pushd $i
../../Bin/myprog ../inputfile.$i.in
popd
done
If you are trying to change where an existing program (e.g., myprog) writes its files, this is only possible if the program writes its files relative to the current directory. In this case, the outer script that invokes myprog, can create a "destination" directory and chdir to it before invoking myprog.
If the myprog program writes to an absolute path, e.g., /var/tmp/myprog.tmp, the only way to override where this write actually goes is to place a symbolic link at the absolute path linking to the desired destination. This will only work if the program (myprog) doesn't first delete an existing file before writing to it.
The third and most extreme possibility for directing absolute file path writes is to create a chroot'ed file system, in which the myprog output files will be contained, after which the outer script can copy or move them to where they are desired.
To summarize: other than changing the source, setting the working directory for relative-path output files, or chrooting a filesystem for absolute-path files, there really is no "elegant" way to replace the actual output files used in a program.

How to swap out to a new file a running process output is redirecting to, without restarting the command?

I have a background process that I do not want to restart. Its output is actively being logged to a file.
nohup mycommand 1> myoutputfile.log 2>&1 &
I want to "archive" the file the process is currently writing its output to, and make it start writing to a blank file at the same file name. I must be able to do this without having to kill the process and start it again.
I tried simply renaming the existing file (to myoutputfile_.log), hoping that the shell now finding that the file is no longer there, will create a new file with the original file name (myoutputfile.log). But this does not work as the shell holds a reference to the file's location and keeps appending to it.
I looked here. On executing ls, I see that the streams are now marked as (deleted) but I'm quite confused what to do next. In the gdb command, do I have to specify the process executable in addition to the process ID? What happens if I don't specify it or I get it wrong? Once in gdb, how do I force the stream to re-create a file in the deleted file's same location (same path and filename)?
How can I use the commands in shell to signal it to start a new file for an existing process's output redirection?
PS: I can't do a trial-and-error because it's rather important I get this right. If it is relevant to know, this is a java process.
I resolved this issue by doing the following:
cp myoutputfile.log myoutputfile_.log; echo > myoutputfile.log
This essentially reset the log file after copying the original contents to a new file.

Resources