Need shell script to:
1/keep polling a directory "receive_dir" irrespective of having files or no files in it.
2/move the files over to another directory "send_dir".
3/the script should only stop polling upon a file "stopfile" get moved to "receive_dir". Thanks !!
My script:
until [ $i = stopfile ]
do
for i in `ls receive_dir`; do
time=$(date +%m-%d-%Y-%H:%M:%S)
echo $time
mv receive_dir/$i send_dir/;
done
done
This fails on empty directories and also is there any better way ?
If you are running on Linux, you might wish to consider inotifywait
$ declare -f tillStopfile
tillStopfile ()
{
cd receive_dir
[[ -d ../send_dir ]] || mkdir ../send_dir
while true; do
date +%m-%d-%Y-%H:%M:%S
for f in *
do
mv "$f" ../send_dir
[[ $f == "stopfile" ]] && break 2
done
sleep 3
done
}
$
Improvements
while true ... break #
easier to control this loop
cd receive_dir #
why not run in the "receive_dir"
factor date out of inner loop #
unless you need to see each time-stamp?
added suggested "sleep"
# pick a suitable inteval
Run:
$ tillStopfile 2>/dev/null # suppresses ls error messages
Related
I have a directory with several sub-directories with names
1
2
3
4
backup_1
backup_2
I wrote a parallelized bash code to process files in these folders and a minimum working example is as follows:
#!/bin/bash
P=`pwd`
task(){
dirname=$(basename $dir)
echo $dirname running >> output.out
if [[ $dirname != "backup"* ]]; then
sed -i "s/$dirname running/$dirname is good/" $P/output.out
else
sed -i "s/$dirname running/$dirname ignored/" $P/output.out
fi
}
for dir in */; do
((i=i%8)); ((i++==0)) && wait
task "$dir" &
done
wait
echo all done
The "wait" at the end of the script is supposed to wait for all processes to finish before proceeding to echo "all done". The output.out file, after all processes are finished should show
1 is good
2 is good
3 is good
4 is good
backup_1 ignored
backup_2 ignored
I am able to get this output if I set the script to run in serial with ((i=i%1)); ((i++==0)) && wait. However, if I run it in parallel with ((i=i%2)); ((i++==0)) && wait, I get something like
2 is good
1 running
3 running
4 is good
backup_1 running
backup_2 ignored
Can anyone tell me why is wait not working in this case?
I also know that GNU parallel can do the same thing in parallelizing tasks. However, I don't know how to command parallel to run this task on all sub-directories in the parent directory. It'll be great is someone can produce a sample script that I can follow.
Many thanks
Jacek
A literal porting to GNU Parallel looks like this:
task(){
dir="$1"
P=`pwd`
dirname=$(basename $dir)
echo $dirname running >> output.out
if [[ $dirname != "backup"* ]]; then
sed -i "s/$dirname running/$dirname is good/" $P/output.out
else
sed -i "s/$dirname running/$dirname ignored/" $P/output.out
fi
}
export -f task
parallel -j8 task ::: */
echo all done
As others point out you have race conditions when you run sed on the same file in parallel.
To avoid race conditions you could do:
task(){
dir="$1"
P=`pwd`
dirname=$(basename $dir)
echo $dirname running
if [[ $dirname != "backup"* ]]; then
echo "$dirname is good" >&2
else
echo "$dirname ignored" >&2
fi
}
export -f task
parallel -j8 task ::: */ >running.out 2>done.out
echo all done
You will end up with two files running.out and done.out.
If you really just want to ignore the dirs called backup*:
task(){
dir="$1"
P=`pwd`
dirname=$(basename $dir)
echo $dirname running
echo "$dirname is good" >&2
}
export -f task
parallel -j8 task '{=/backup/ and skip()=}' ::: */ >running.out 2>done.out
echo all done
Consider spending 20 minutes on reading chapter 1+2 of https://doi.org/10.5281/zenodo.1146014 Your command line will love you for it.
I have catalogs
site_2021-11-09_0
site_2021-11-09_1
site_2021-11-09_2
site_2021-11-09_3
site_2021-11-09_4
site_2021-11-09_5
site_2021-11-09_6
I need to add next directory that does not exist, which is site_2021-11-09_7. I need to write a script on loops. How can this be done?
I currently have
#!/bin/bash
date="$(date +%F)"
site="site"
i="0"
while [ ! -d ${site}_${date}_$i ]
do
echo ${site}_${date}_$i
mkdir ${site}_${date}_$i
i=$(( $i + 1))
done
but it doesn't work. If no directories exist, it works forever. If there is directory site_2021-11-09_0, it doesn't work at all. How to understand it logically?
Here are some ways to achieve what you want:
i=0; while [ -d "${site}_${date}_$i" ]; do ((i++)); done; mkdir -v "${site}_${date}_$i"
i=0; while ! mkdir "${site}_${date}_$i"; do ((i++)); done
i=0; for d in "${site}_${date}_"*; do ((i=i>${d##*_}:i?${d##*_})); done; mkdir -v "${site}_${date}_$((i+1))"
when your directories are sortable, that is only when the index counter has a fixed amount of digits, (e.g. ${site}_${date}_001, ${site}_${date}_002 , ... , ${site}_${date}_078), you can make use of the lexicographical ordering of globs
dirlist=( "${site}_${date}"_* )
mkdir -v "${site}_${date}_$(printf "%.3d" "$((10#${dirlist[-1]##*_}+1))")"
Presently your code is doing
while (directory does not exist)
do stuff
but your first directory site_2021-11-09_0 does indeed exist. So the condition inside the while is never satisfied and so the program doesn't run. You can make a slight modification to your code by changing the logic to keep running as long as the directory exists and then make a new directory with the next index when the loop is broken
#! /bin/sh
date="$(date +%F)"
site="site"
i="0"
while [ -d ${site}_${date}_$i ]
do
echo ${site}_${date}_$i
i=$(( $i + 1))
done
mkdir ${site}_${date}_$i
You can use this bash script:
#!/bin/bash -e
prefix=${1:?no site given}_$(date +%F)_
while [[ -d "$prefix$((i++))" ]]; do :; done
mkdir "$prefix$((i-1))"
Call like ./mk-site-dir sitename.
You can hardcode sitename if you want.
As #kvantour says, you currently make a directory so long as it doesn't exist, and then increment i. Thus in any empty directory it will run indefinitely; whereas if there is a matching dir after 0 your code will make all the directories before it and then stop. What you want is probably:
while [ -d ${site}_${date}_$i ]
do
i=$(( $i + 1))
done
mkdir ${site}_${date}_$i
I.e. get the first directory which doesn't exist, and then make it.
As an example, I have 7 directories each containing 4 files. The 4 files follow the following naming convention name_S#_L001_R1_001.fastq.gz. The sed command is to partially keep the unique file name.
I have a nested for loop in order to enter a directory and perform a command, exit the directory and proceed to the next directory. Everything seems to be working beautifully, however the code gets stuck on the last directory looping 4 times.
for f in /completepath/*
do
[ -d $f ] && cd "$f" && echo Entering into $f
for y in `ls *.fastq.gz | sed 's/_L00[1234]_R1_001.fastq.gz//g' | sort -u`
do
echo ${y}
done
done
Example output-
Entering into /completepath/m_i_cast_avpv_1
iavpvcast1_S6
Entering into /completepath/m_i_cast_avpv_2
iavpvcast2_S6
Entering into /completepath/m_i_int_avpv_1
iavpvint1_S5
Entering into /completepath/m_i_int_avpv_2
iavpvint2_S5
Entering into /completepath/m_p_cast_avpv_1
pavpvcast1_S8
Entering into /completepathd/m_p_int_avpv_1
pavpvint1_S7
Entering into /completepath/m_p_int_avpv_2
pavpvint2_S7
pavpvint2_S7
pavpvint2_S7
pavpvint2_S7
Any recommendations of how to correctly exit the inner loop?
It looks like /completepath/ contains some entries that are not directories. When the loop over /completepath/* sees something that's not a directory, it doesn't enter it, thanks to the [ -d $f ] check.
But it still continues to run the next for y in ... loop.
At that point the script is still in the previous directory it has seen.
One way to solve that is to skip the rest of the loop when $f is not a directory:
if [ -d $f ]; then
cd "$f" && echo Entering into $f
else
continue
fi
There's an even better way. By writing /completepath/*/ only directory entries will be matched, so you can simplify your loop to this:
for f in /completepath/*/
do
cd "$f" && echo "Entering into $f" || { echo "Error: could not enter into $f"; continue; }
for y in $(ls *.fastq.gz | sed 's/_L00[1234]_R1_001.fastq.gz//g' | sort -u)
do
echo ${y}
done
done
I am trying to check to see if a file (in this case /var/log/messages) has been updated using 'stat' command in a loop. However the code never exits the loop and moves on for some reason.
#!/bin/bash
check='/var/log/messages'
THEN="stat -c %z ${check}"
NOW="stat -c %z ${check}"
while [ $"NOW" == $"THEN" ]
do
echo "$NOW"
if [ $"NOW" != $"THEN" ]; then
echo "${check} has been updated."
if
done
Thoughts on this? Is there an easier way to see if /var/log/messages has changed?
The dollar signs need to be inside the quotes. $"..." is a special quoting mechanism for doing translations, so unless you are using a locale in which NOW and THEN translate to the same string, the condition will never be true.
if [ "$NOW" == "$THEN" ]; then
Firstly the version below in the least executes the stat command plus all the changes that are explained above. One thing that you would have to think about is the then and now would almost always be the same unless the messages is being updated in less then the time it takes to execute
#!/bin/bash
check='/var/log/messages'
THEN=`stat -c %z ${check}`
NOW=`stat -c %z ${check}`
if [ "$NOW" == "$THEN" ]; then
echo "$NOW"
elif [ "$NOW" != "$THEN" ]; then
echo "$check has been updated."
fi
~
Note the fact that the stat is being executed.
The piece of code provided by in the OP never finishes because of the following reason:
the variables NOW and THEN are never updated inside of the while-loop. As they never update, the loop continues to run as both values are identical.
the variables NOW and THEN represent the status change and not the modification change. The time of status-change is the time when the meta-data of the file has changed (permissions, groups, ...). The user might be more interested in looking at the content change (modification change)
the variables NOW and THEN actually are not the output of the stat command, but just the command itself since they are not executed using $(stat ...).
An update to the code would be:
#!/usr/bin/env bash
check_file="/var/log/messages"
THEN="$(stat -c "%Y" "${check_file}")"
NOW="$(stat -c "%Y" "${check_file}")"
while [ "$NOW" = "$THEN" ]; do
# Sleep a second to have some waiting time between checks
sleep 1
# Update now
NOW="$(stat -c "%Y" ${check_file}")"
done
echo "${check_file} has been modified"
This can be simplified to
#!/usr/bin/env bash
check_file="/var/log/messages"
THEN="$(stat -c "%Y" "${check_file}")"
while [ "$(stat -c "%Y" ${check_file}")" = "$THEN" ]; do
# Sleep a second to have some waiting time between checks
sleep 1
done
echo "${check_file} has been modified"
However, it might be easier to use bash-interals and check modification by comparing files-dates
#!/usr/bin/env bash
touch tmpfile
while [ tmpfile -nt /var/log/messages ]; do sleep 1; done
echo "/var/log/messages has been modified"
rm tmpfile
This method is unfortunately fairly impractical. If you want to monitor if a file has been updated, especially in case of log-file where lines are appended, you can just use tail. The command will print out the new lines which are appended the moment a file is updated.
$ tail -f -- /var/log/messages
If you don't want to monitor updates to log-files but just want to check if a file is updated, you can use inotify-wait:
$ inotifywait -e modify /var/log/messages
I have a script below that does a few things...
#!/bin/bash
# Script to sync dr-xxxx
# 1. Check for locks and die if exists
# 2. CPIO directories found in cpio.cfg
# 3. RSYNC to remote server
# 5. TRAP and remove lock so we can run again
if ! mkdir /tmp/drsync.lock; then
printf "Failed to aquire lock.\n" >&2
exit 1
fi
trap 'rm -rf /tmp/drsync.lock' EXIT # remove the lockdir on exit
# Config specific to CPIO
BASE=/home/mirxx
DUMP_DIR=/usrx/drsync
CPIO_CFG="$BASE/cpio.cfg"
while LINE=: read -r f1 f2
do
echo "Working with $f1"
cd $f1
find . -print | cpio -o | gzip > $DUMP_DIR/$f2.cpio.gz
echo "Done for $f1"
done <"$CPIO_CFG"
RSYNC=/usr/bin/rsync # use latest version
RSYNC_BW="4500" # 4.5MB/sec
DR_PATH=/usrx/drsync
DR_USER=root
DR_HOST=dr-xxxx
I=0
MAX_RESTARTS=5 # max rsync retries before quitting
LAST_EXIT_CODE=1
while [ $I -le $MAX_RESTARTS ]
do
I=$(( $I + 1 ))
echo $I. start of rsync
$RSYNC \
--partial \
--progress \
--bwlimit=$RSYNC_BW \
-avh $DUMP_DIR/*gz \
$DR_USER#$DR_HOST:$DR_PATH
LAST_EXIT_CODE=$?
if [ $LAST_EXIT_CODE -eq 0 ]; then
break
fi
done
# check if successful
if [ $LAST_EXIT_CODE -ne 0 ]; then
echo rsync failed for $I times. giving up.
else
echo rsync successful after $I times.
fi
What I would like to change above is, for this line..
find . -print | cpio -o | gzip > $DUMP_DIR/$f2.cpio.gz
I am looking to change the above line so that it starts a parallel process for every entry in CPIO_CFG which gets feed in. I believe i have to use & at the end? Should I implement any safety precautions?
Is it also possible to modify the above command to also include an exclude list that I can pass in via $f3 in the cpio.cfg file.
For the below code..
while [ $I -le $MAX_RESTARTS ]
do
I=$(( $I + 1 ))
echo $I. start of rsync
$RSYNC --partial --progress --bwlimit=$RSYNC_BW -avh $DUMP_DIR/*gz $DR_USER#$DR_HOST:$DR_PATH
LAST_EXIT_CODE=$?
if [ $LAST_EXIT_CODE -eq 0 ]; then
break
fi
done
The same thing here, is it possible to run multiple RSYNC threads one for .gz file found in $DUMP_DIR/*.gz
I think the above would greatly increase the speed of my script, the box is fairly beefy (AIX 7.1, 48 cores and 192GB RAM).
Thank you for your help.
The original code is a traditional batch queue. Let's add a bit of lean thinking...
The actual workflow is the transformation and transfer of a set of directories in compressed cpio format. Assuming that there is no dependency between the directories/archives, we should be able to create a single action for creating the archive and the transfer.
It helps if we break up the script into functions, which should make our intentions more visible.
First, create a function transfer_archive() with archive_name and an optional number_of_attempts as arguments. This contains your second while loop, but replaces $DUMP_DIR/*gz with $archive_name. Details will be left as an exercise.
function transfer_archive {
typeset archive_name=${1:?"pathname to archive expected"}
typeset number_of_attempts=${2:-1}
(
n=0
while
((n++))
((n<=number_of_attempts))
do
${RSYNC:?}
--partial \
--progress \
--bwlimit=${RSYNC_BW:?} \
-avh ${archive_name:?} ${DR_USER:?}#${DR_HOST:?}:${DR_PATH:?} && exit 0
done
exit 1
)
}
Inside the function we use a subshell, ( ... ) with two exit statements.
The function will return the exit value of the subshell, either true (rsync succeeded), or false (too many attempts).
We then combine that with archive creation:
function create_and_transfer_archive {
(
# only cd in a subshell - no confusion upstairs
cd ${DUMP_DIR:?Missing global setting} || exit
dir=${1:?directory}
archive=${2:?archive}
# cd, find and cpio must be in the same subshell together
(cd ${dir:?} && find . -print | cpio -o ) |
gzip > ${archive:?}.cpio.gz || return # bail out
transfer_archive ${archive:?}.cpio.gz
)
}
Finally, your main loop will process all directories in parallel:
while LINE=: read -r dir archive_base
do
(
create_and_transfer_archive $dir ${archive_base:?} &&
echo $dir Done || echo $dir failed
) &
done <"$CPIO_CFG" | cat
Instead of the pipe with cat, you could just add wait at the end of the script, but
it has the nice effect of capturing all output from the background processes.
Now, I've glossed over one important aspect, and that is the number of jobs you can run in
parallel. This will scale reasonably well, but it would be better to actually maintain a
job queue. Above a certain number, adding more jobs will start to slow things down, and
at that point you will have to add a job counter and a job limit. Once the job limit is
reached, stop starting more create_and_transfer_archive jobs, until processes have completed.
How to keep track of those jobs is a separate question.