I have a bash shell script using inotifywait to capture files added to a specific Dropbox folder, running a process and saving the file to another Dropbox folder. I'm running PopOS 22.04.
For some reason, inotifywait is now saying $filename is .goutputfile-xxxx and not the actual file name. In earlier testing of this process, it did not do this.
I have found a forum page about commenting out set locking in nanorc config file, but this did nothing.
Below is the opening code to the bash shell script:
#!/bin/bash
#set time and date variables
TIMENOW="$(date +"%T")"
DATENOW="$(date +"%m-%d-%Y")"
#launch the inotifywait utility formatting output to just the file name and piping it into while function to process
#file is saved to Dropbox folder from the first part of a Zapier process
inotifywait -m -e create --format %f /home/dave/Dropbox/Inbound |
while read -r filename event; do
Please let me know what config file I need to edit or code I need to use to force inotifywait to use the actual file name and not the .goutputstream-xxxx name.
I tried editing the nanorc config file by commenting out set locking option per another forum post. This did nothing.
I need some way of capturing the actual file name, even if it is in a separate variable.
I am new to bash scripting and I have to create a script that will run on all computers within my group at work (so it's not just checking one computer). We have a spreadsheet that keeps certain file information, and I am working to automate the updating of that spreadsheet. I already have an existing python script that gathers the information needed and writes to the spreadsheet.
What I need is a bash script (cron job, maybe?) that is activated anytime a user deletes a file that matches a certain extension within the specified file path. The script should hold on to the file name before it is completely deleted. I don't need any other information besides the name.
Does anyone have any suggestions for where I should begin with this? I've searched a bit but not found anything useful yet.
It would be something like:
for folders and files in path:
if file ends in .txt and is being deleted:
save file name
To save the name of every file .txt deleted in some directory path or any of its subdirectories, run:
inotifywait -m -e delete --format "%w%f" -r "path" 2>stderr.log | grep '\.txt$' >>logfile
Explanation:
-m tells inotifywait to keep running. The default is to exit after the first event
-e delete tells inotifywait to only report on file delete events.
--format "%w%f" tells inotifywait to print only the name of the deleted file
path is the target directory to watch.
-r tells inotifywait to monitor subdirectories of path recursively.
2>stderr.log tells the shell to save stderr output to a file named stderr.log. As long as things are working properly, you may ignore this file.
>>logfile tells the shell to redirect all output to the file logfile. If you leave this part off, output will be directed to stdout and you can watch in real time as files are deleted.
grep '\.txt$' limits the output to files with .txt extensions.
Mac OSX
Similar programs are available for OSX. See "Is there a command like “watch” or “inotifywait” on the Mac?".
I have cobbled together a shell script to submit multiple jobs on a cluster, which it appears to without giving me an error message, but the output files are missing and the error log files are also empty. What the script supposed to do is 1.) make a bunch of new directories, 2.) copy four files to each (mainparams, extraparams, infile, and structurejobsubmissionfile) 3.) then submit each one to the cluster for it to run structure while changing one parameter in the mainparams file every tenth directory (that's the 's/changethis/'$k'/g' line).
Test running it on the front end gives no errors, the structure program is up to date on the cluster, and the cluster administrators don't see anything wrong. Thanks!
#!/bin/bash
reps=10
numK=10
for k in $(seq $numK);
do
for i in $(seq $reps);
do
#make folder name (ex. k4rep7)
tmpstr="k${k}rep${i}"
#echo "Making folder and filename $tmpstr"
#make the new folder
mkdir $tmpstr
#go to that folder
cd ./$tmpstr
#copy in the input files
cp ../str_in/* ./
#modify the recently copied input file here so source file remains the same
cp ./mainparams ./temp.txt
#change maxpops to current value of k and the directory for the files to the current directory
sed -e 's/changethis/'$k'/g' -e "s:pathforrunningstructurehere:$PWD:g" ./temp.txt > ./mainparams
#get rid of temporary file
rm ./temp.txt
#inside $i so run STRUCTURE here
qsub -q fnrgenetics -l nodes=1:ppn=1,walltime=20:00:00 structurejobsubmissionfile
#go back to parent directory
cd ../
done
done
I can't see anything obviously wrong, but I think the place that you'll find the answer lies in better logging and better error checking. Some of the things that you're not checking that you should:
Is $tmpstr created correctly? (will fail on disk full or if permissions are not set correctly)
does str_in/ exist, and is it a directory?
does it contain files?
does it contain mainparams?
is qsub in $PATH?
does the call to qsub return an error?
You can roll an error logging function of your own, or use a package like log4bash
I have created a script which pick the files from a directory inbox. I have to include a handling in the script whether files are coming from another process or not. if files are coming through another process then my script should wait until files are copied.
For this i have create a flag
CHECK_COPY_PROCESS=$(ps -ef|grep -E 'cp|mv|scp'|grep inbox)
if flag CHECK_COPY_PROCESS contains some value then process will go to waiting state. but problem is that suppose some files are coming from sub directory of that directory inbox
then process will generate like this cp file_name .. above logic is not working.
You could use the tip of Basile, using lsof in conjunction with awk (grepping only on the first column, which is the command name)
Example:
lsof +D /path/to/inbox|awk '$1~/mv|cp|scp/'
This is not tested as I currently don't have big files that take a while to copy on my machine.
I have been working on how to verify that millions of files that were on file system A have infact been moved to file system B. While working on a system migration, it became evident that all the files needed to be audited to prove that the files have been moved. The files were initially moved via rsync, which does provide logs, although not in a format that is helpful for doing an audit. So, I wrote this script to index all the files on System A:
#!/bin/bash
# Get directories and file list to be used to verify proper file moves have worked successfully.
LOGDATE=`/usr/bin/date +%Y-%m-%d`
FILE_LIST_OUT=/mounts/A_files_$LOGDATE.txt
MOUNT_POINTS="/mounts/AA mounts/AB"
touch $FILE_LIST_OUT
echo TYPE,USER,GROUP,BYTES,OCTAL,OCTETS,FILE_NAME > $FILE_LIST_OUT
for directory in $MOUNT_POINTS; do
# format: type,user,group,bytes,octal,octets,file_name
gfind $directory -mount -printf "%y","%u","%g","%s","%m","%p\n" >> $FILE_LIST_OUT
done
The file indexing works fine and takes about two hours to index ~30 million files.
On side B is where we run into issues. I have written a very simple shell script that reads the index file, tests to see if the file is there, and then counts up how many files are there, but it's running out of memory while looping through the 30 million lines on indexed file names. Effectively doing this little bit of code below through a while loop, and counters to increment for files found and not found.
if [ -f "$TYPE" "$FILENAME" ] ; then
print file found
++
else
file not found
++
fi
My questions are:
Can a shell script do this type of reporting from such a large list. A 64 bit unix system ran out of memory while trying to execute this script. I have already considered breaking up the input script into smaller chunks to make it faster. Currently it can
If as shell script is inappropriate, what would you suggest?
You just used rsync, use it again...
--ignore-existing
This tells rsync to skip updating files that already exist on the destination (this does not ignore existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn’t affect the data that goes into the file-lists, and thus it doesn’t affect deletions. It just limits the files that the receiver requests to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to continue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hierarchy (when it is used properly), using --ignore existing will ensure that the already-handled files don’t get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that this option is only looking at the existing files in the destination hierarchy itself.
That will actually fix any problems (at least in the same sense that any diff-list on file-exist tests could fix problem. Using --ignore-existing means rsync only does the file-exist tests (so it'll construct the diff list as you request and use it internally). If you just want information on the differences, check --dry-run, and --itemize-changes.
Lets say you have two directories, foo and bar. Let's say bar has three files, 1,2, and 3. Let's say that bar, has a directory quz, which has a file 1. The directory foo is empty:
Now, here is the result,
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/
>f+++++++++ 1
>f+++++++++ 2
>f+++++++++ 3
cd+++++++++ quz/
>f+++++++++ quz/1
Note, you're not interested in the cd+++++++++ -- that's just showing you that rsync issued a chdir. Now, let's add a file in foo called 1, and let's use grep to remove the chdir(s),
$ rsync -ri --dry-run --ignore-existing ./bar/ ./foo/ | grep -v '^cd'
>f+++++++++ 2
>f+++++++++ 3
>f+++++++++ quz/1
f is for file. The +++++++++ means the file doesn't exist in the DEST dir.
Here is the bonus, remove --dry-run, and, it'll go ahead and make the changes for you.
Have you considered a solution such as kdiff3, which will diff directories of files ?
Note the feature for version 0.9.84
Directory-Comparison: Option "Full Analysis" allows to show the number
of solved vs. unsolved conflicts or deltas vs. whitespace-changes in
the directory tree.
There is absolutely no problem reading a 30 million line file in a shell script. The reason why your process failed was most likely that you tried to read the file entirely into memory, e.g. by doing something wrong like for i in $(cat file).
The correct way of reading a file is:
while IFS= read -r line
do
echo "Something with $line"
done < someFile
A shell script is inappropriate, yes. You should be using a diff tool:
diff -rNq /original /new
If you're not particular about the solution being a script, you could also look into meld, which would let you diff directory trees quite easily and you can also set ignore patterns if you have any.