Generate an arbitrary number of process substitutions as tee arguments

Generate an arbitrary number of process substitutions as tee arguments - bash

I'm using this code in a bash script. I use it to transfer a source folder to multiple destinations:
cd /Volumes/ ; tar cf - SOURCE/ | tee \
>( cd /Volumes/dest1 ; tar xf - ) \
>( cd /Volumes/dest2 ; tar xf - ) \
> /dev/null
This command works well. I want to set the destinations at the beginning of the script. So the number of destinations can vary.
For example the destinations can be in a var or an array:
destinationList=/Volumes/dest1 /Volumes/dest2
cd /Volumes/Untitled/ ; tar cf - SOURCE/ | tee \
# for item in destinationList
# do
# add this code ">( cd $item ; tar xf - )"
# done
> /dev/null
Is there a nice way to do it?

This is a case where eval is one of the easier options, though it needs to be very used very carefully.
unpackInDestinations() {
local dest currArg='' evalStr=''
for dest; do
printf -v currArg '>(cd %q && exec tar xf -)' "$dest"
evalStr+=" $currArg"
done
eval "tee $evalStr >/dev/null"
}
tar cf - SOURCE/ | unpackInDestinations /Volumes/dest{1,2}
Less efficiently (but without, perhaps, causing anyone trying to audit the code's security as much consternation), one can also write a recursive function:
unpackInDestinations() {
local dest
if (( $# == 0 )); then
cat >/dev/null
elif (( $# == 1 )); then
cd "$1" && tar xf -
else
dest=$1; shift
tee >(cd "$dest" && exec tar xf -) | unpackInDestinations "$#"
fi
}
The number of tees this creates varies with the number of arguments, so it's substantially less efficient than the hand-written code or the eval-based equivalent to same.
If you only need to support new versions of bash (the below requires at least 4.1), there's some additional magic available that can provide the best of both worlds:
unpackInDestinations() {
local -a dest_fds=( ) args=( )
local arg fd_num retval
# open a file descriptor for each argument
for arg; do
exec {fd_num}> >(cd "$arg" && exec tar xf -)
dest_fds+=( "$fd_num" )
args+=( "/dev/fd/$fd_num" )
done
tee "${args[#]}" >/dev/null; retval=$?
# close the FDs
for fd_num in "${dest_fds[#]}"; do
exec {fd_num}>&-
done
# and return the exit status we got from tee
return "$retval"
}

Related

Simplest way to "correct" an accidental use of mv instead of an hg mv?

I have a tracked foo. Now, since I'm absent-minded, I've run:
mv foo bar
now, when I do hg st, I get:
! foo
? bar
I want to fix this retroactively - as though I'd done an hg mv foo bar.
Now, I could write a bash script which does that for me - but is there something better/simpler/smarter I could do?

Use the --after option: hg mv --after foo bar
$ hg mv --help
hg rename [OPTION]... SOURCE... DEST
aliases: move, mv
rename files; equivalent of copy + remove
Mark dest as copies of sources; mark sources for deletion. If dest is a
directory, copies are put in that directory. If dest is a file, there can
only be one source.
By default, this command copies the contents of files as they exist in the
working directory. If invoked with -A/--after, the operation is recorded,
but no copying is performed.
This command takes effect at the next commit. To undo a rename before
that, see 'hg revert'.
Returns 0 on success, 1 if errors are encountered.
options ([+] can be repeated):
-A --after record a rename that has already occurred
-f --force forcibly copy over an existing managed file
-I --include PATTERN [+] include names matching the given patterns
-X --exclude PATTERN [+] exclude names matching the given patterns
-n --dry-run do not perform actions, just print output
--mq operate on patch repository
(some details hidden, use --verbose to show complete help)

Here's what I'm doing right now;
#!/bin/bash
function die {
echo "$1" >&2
exit -1
}
(( $# == 2 )) || die "Usage: $0 <moved filename> <original filename>"
[[ -e "$1" ]] || die "Not an existing file: $1"
[[ ! -e "$2" ]] || die "Not a missing file: $2"
hg_st_lines_1=$(hg st "$1" 2>/dev/null | wc -l)
hg_st_lines_2=$(hg st "$2" 2>/dev/null | wc -l)
(( ${hg_st_lines_1} == 1 )) || die "Expected exactly one line in hg status for $1, but got ${hg_st_lines_1}"
(( ${hg_st_lines_2} == 1 )) || die "Expected exactly one line in hg status for $2, but got ${hg_st_lines_2}"
[[ "$(hg st "$1" 2>/dev/null)" == \?* ]] || die "Mercurial does not consider $1 to be an unknown (untracked) file"
[[ "$(hg st "$2" 2>/dev/null)" =~ !.* ]] || die "Mercurial does not consider $2 to be a missing file"
mv $1 $2
hg mv $2 $1

get the script path with whitespaces in Bash

I'm running it under MacOS El Capitan 10.10.6
among all commands to get my current dir (path I'm running my script from) only this works for me:
FILES="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
But it's not going to work if the folder has whitespace in it's name (aka: "folder name")
How to fix this?
Thank you! )
Update: added a script:
#!/bin/bash
function check ()
{
oldsize=`wc -c <"$1"`
sleep 1
newsize=`wc -c <"$1"`
while [ "$oldsize" -lt "$newsize" ]
do
echo "Not yet..."
oldsize=`wc -c <"$1"`
sleep 1
newsize=`wc -c <"$1"`
done
if [ "$oldsize" -eq "$newsize" ]
then
echo "The file has been copied completely."
fi
}
FILES="$(dirname "${BASH_SOURCE[0]}")/*"
function main
{
for f in $FILES
do
if [[ "$f" =~ \.mkv$ ]];
then
#/////////////////////////////////////////////////////////////////////
check "$f"
(( count = count + 1 ))
g="${f/mkv/avi}"
#LOG_FILE="${g/avi/log}"
#exec > >(tee -a "${LOG_FILE}" )
#exec 2> >(tee -a "${LOG_FILE}" >&2)
now="$(date)"
printf "Current date and time %s\n" "$now"
echo "Processing $f file..."
#avconv -i "${f}" -map 0:0 -map 0:1 -codec copy -sn "${g}"
avconv -i "$f" -map 0 -codec copy "$g"
if [ $? -eq 0 ]; then
echo OK
rm "$f"
else
echo FAIL
rm "$g"
#rm "$LOG_FILE"
return
fi
fi
#/////////////////////////////////////////////////////////////////////
done
}
############
count=0
############
main
if (($count > 0)); then
open "$(dirname "${BASH_SOURCE[0]}")"
fi
exit

I am using Mac OS X 10.11.6, and I have a directory $HOME/tmp. From there, I executed:
$ cd $HOME/tmp
$ pwd
/Users/jleffler/tmp
$ mkdir -p "Spaced Out Directory "/bin
$ export PATH="$PATH:$PWD/$_"
$ cat <<'EOF' > Spaced\ Out\ \ Directory\ \ \ /bin/gorblinsky.sh
> #!/bin/bash
>
> echo "PWD=$PWD"
> DIR="$(dirname "${BASH_SOURCE[0]}")"
> echo "DIR=$DIR"
> cd "$DIR"
> pwd
> echo "PWD=$PWD"
> EOF
$ chmod +x Spaced\ Out\ \ Directory\ \ \ /bin/gorblinsky.sh
$ gorblinsky.sh
PWD=/Users/jleffler/tmp
DIR=/Users/jleffler/tmp/Spaced Out Directory /bin
/Users/jleffler/tmp/Spaced Out Directory /bin
PWD=/Users/jleffler/tmp/Spaced Out Directory /bin
$
This shows that the command $(dirname "${BASH_SOURCE[0]}") can determine the name of the directory where the source for the command is stored.
If the script was going to use the variable $DIR to specify file names, you'd need to be careful (very careful) to ensure it is always properly quoted.
For example:
cp "$DIR/gorblinksky.h" "$HOME/tmp/cobbled together name"
Modern style is to always (double) quote all variable references, even when there's nothing in them that needs protecting (see shellcheck.net for example — and Google Shell Style Guide). I'm old-school enough not to put quotes around names that can't contain spaces or metacharacters, but I guess that is just old-fashioned. For example, I shell-checked a script for playing with RCS version numbers, and it doesn't quote variables containing dotted strings of digits (9.19.2.24 — could be an IBM IPv4 address too) and I was told off for not quoting them, though the file names were already protected with quotes.

Unix shell scripting, need to adjust my script for performance?

I have a script below that does a few things...
#!/bin/bash
# Script to sync dr-xxxx
# 1. Check for locks and die if exists
# 2. CPIO directories found in cpio.cfg
# 3. RSYNC to remote server
# 5. TRAP and remove lock so we can run again
if ! mkdir /tmp/drsync.lock; then
printf "Failed to aquire lock.\n" >&2
exit 1
fi
trap 'rm -rf /tmp/drsync.lock' EXIT # remove the lockdir on exit
# Config specific to CPIO
BASE=/home/mirxx
DUMP_DIR=/usrx/drsync
CPIO_CFG="$BASE/cpio.cfg"
while LINE=: read -r f1 f2
do
echo "Working with $f1"
cd $f1
find . -print | cpio -o | gzip > $DUMP_DIR/$f2.cpio.gz
echo "Done for $f1"
done <"$CPIO_CFG"
RSYNC=/usr/bin/rsync # use latest version
RSYNC_BW="4500" # 4.5MB/sec
DR_PATH=/usrx/drsync
DR_USER=root
DR_HOST=dr-xxxx
I=0
MAX_RESTARTS=5 # max rsync retries before quitting
LAST_EXIT_CODE=1
while [ $I -le $MAX_RESTARTS ]
do
I=$(( $I + 1 ))
echo $I. start of rsync
$RSYNC \
--partial \
--progress \
--bwlimit=$RSYNC_BW \
-avh $DUMP_DIR/*gz \
$DR_USER#$DR_HOST:$DR_PATH
LAST_EXIT_CODE=$?
if [ $LAST_EXIT_CODE -eq 0 ]; then
break
fi
done
# check if successful
if [ $LAST_EXIT_CODE -ne 0 ]; then
echo rsync failed for $I times. giving up.
else
echo rsync successful after $I times.
fi
What I would like to change above is, for this line..
find . -print | cpio -o | gzip > $DUMP_DIR/$f2.cpio.gz
I am looking to change the above line so that it starts a parallel process for every entry in CPIO_CFG which gets feed in. I believe i have to use & at the end? Should I implement any safety precautions?
Is it also possible to modify the above command to also include an exclude list that I can pass in via $f3 in the cpio.cfg file.
For the below code..
while [ $I -le $MAX_RESTARTS ]
do
I=$(( $I + 1 ))
echo $I. start of rsync
$RSYNC --partial --progress --bwlimit=$RSYNC_BW -avh $DUMP_DIR/*gz $DR_USER#$DR_HOST:$DR_PATH
LAST_EXIT_CODE=$?
if [ $LAST_EXIT_CODE -eq 0 ]; then
break
fi
done
The same thing here, is it possible to run multiple RSYNC threads one for .gz file found in $DUMP_DIR/*.gz
I think the above would greatly increase the speed of my script, the box is fairly beefy (AIX 7.1, 48 cores and 192GB RAM).
Thank you for your help.

The original code is a traditional batch queue. Let's add a bit of lean thinking...
The actual workflow is the transformation and transfer of a set of directories in compressed cpio format. Assuming that there is no dependency between the directories/archives, we should be able to create a single action for creating the archive and the transfer.
It helps if we break up the script into functions, which should make our intentions more visible.
First, create a function transfer_archive() with archive_name and an optional number_of_attempts as arguments. This contains your second while loop, but replaces $DUMP_DIR/*gz with $archive_name. Details will be left as an exercise.
function transfer_archive {
typeset archive_name=${1:?"pathname to archive expected"}
typeset number_of_attempts=${2:-1}
(
n=0
while
((n++))
((n<=number_of_attempts))
do
${RSYNC:?}
--partial \
--progress \
--bwlimit=${RSYNC_BW:?} \
-avh ${archive_name:?} ${DR_USER:?}#${DR_HOST:?}:${DR_PATH:?} && exit 0
done
exit 1
)
}
Inside the function we use a subshell, ( ... ) with two exit statements.
The function will return the exit value of the subshell, either true (rsync succeeded), or false (too many attempts).
We then combine that with archive creation:
function create_and_transfer_archive {
(
# only cd in a subshell - no confusion upstairs
cd ${DUMP_DIR:?Missing global setting} || exit
dir=${1:?directory}
archive=${2:?archive}
# cd, find and cpio must be in the same subshell together
(cd ${dir:?} && find . -print | cpio -o ) |
gzip > ${archive:?}.cpio.gz || return # bail out
transfer_archive ${archive:?}.cpio.gz
)
}
Finally, your main loop will process all directories in parallel:
while LINE=: read -r dir archive_base
do
(
create_and_transfer_archive $dir ${archive_base:?} &&
echo $dir Done || echo $dir failed
) &
done <"$CPIO_CFG" | cat
Instead of the pipe with cat, you could just add wait at the end of the script, but
it has the nice effect of capturing all output from the background processes.
Now, I've glossed over one important aspect, and that is the number of jobs you can run in
parallel. This will scale reasonably well, but it would be better to actually maintain a
job queue. Above a certain number, adding more jobs will start to slow things down, and
at that point you will have to add a job counter and a job limit. Once the job limit is
reached, stop starting more create_and_transfer_archive jobs, until processes have completed.
How to keep track of those jobs is a separate question.

Comparing an existing file with the result of a heavy process using named pipes

I'm trying to figure out a way to compare an existing file with the result of a process (a heavy one, not to be repeated) and clobber the existing file with the result of that process without having to write it in a temp file (it would be a large temp file, about the same size of the existing file: let's try to be efficient and not take twice space it should).
I would like to replace the normal file /tmp/replace_with_that (see below) with a fifo, but of course doing so with the code below would just lock up the script, since the /tmp/replace_with_that fifo cannot be read before comparing the existing file with the named pipe /tmp/test_against_this
#!/bin/bash
mkfifo /tmp/test_against_this
: > /tmp/replace_with_that
echo 'A B C D' >/some/existing/file
{
#A very heavy process not to repeat;
#Solved: we used a named pipe.
#Its large output should not be sent to a file
#To solve: using this code, we write the output to a regular file
for LETTER in "A B C D E"
do
echo $LETTER
done
} | tee /tmp/test_against_this /tmp/replace_with_that >/dev/null &
if cmp -s /some/existing/file /tmp/test_against_this
then
echo Exact copy
#Don't do a thing to /some/existing/file
else
echo Differs
#Clobber /some/existing/file with /tmp/replace_with_that
cat /tmp/replace_with_that >/some/existing/file
fi
rm -f /tmp/test_against_this
rm -f /tmp/replace_with_that

I think I would recommend a different approach:
Generate an MD5/SHA1/SHA256/whatever hash of the existing file
Run your heavy process and replace the output file
Generate a hash of the new file
If the hashes match, the files were the same; if not, the new file is different

Just for completeness, my answer (wanted to explore the use of pipes):
Was trying to find a way to compare on the fly a stream and an existing file, without overwriting the existing file unnecessarily (leaving it as is if stream and file are exact copies), and without creating sometimes big temp files (the product of a a heavy process like mysqldump for instance). The solution had to rely on pipes only (named and anonymous), and maybe a few very small temp files.
The checksum solution suggested by twalberg is just fine, but md5sum calls on large files are processor intensive (and processing time lengthens linearly with file size). cmp is faster.
Example call of the function listed below:
#!/bin/bash
mkfifo /tmp/fifo
mysqldump --skip-comments $HOST $USER $PASSWORD $DB >/tmp/fifo &
create_or_replace /some/existing/dump /tmp/fifo
#This also works, but depending on the anonymous fifo setup, seems less robust
create_or_replace /some/existing/dump <(mysqldump --skip-comments $HOST $USER $PASSWORD $DB)
The functions:
#!/bin/bash
checkdiff(){
local originalfilepath="$1"
local differs="$2"
local streamsize="$3"
local timeoutseconds="$4"
local originalfilesize=$(stat -c '%s' "$originalfilepath")
local starttime
local stoptime
#Hackish: we can't know for sure when the wc subprocess will have produced the streamsize file
starttime=$(date +%s)
stoptime=$(( $starttime + $timeoutseconds ))
while ([[ ! -f "$streamsize" ]] && (( $stoptime > $(date +%s) ))); do :; done;
if ([[ ! -f "$streamsize" ]] || (( $originalfilesize == $(cat "$streamsize" | head -1) )))
then
#Using streams that were exact copies of files to compare with,
#on average, with just a few test runs:
#diff slowest, md5sum 2% faster than diff, and cmp method 5% faster than md5sum
#Did not test, but on large unequal files,
#cmp method should be way ahead of the 2 other methods
#since equal files is the worst case scenario for cmp
#diff -q --speed-large-files <(sort "$originalfilepath") <(sort -) >"$differs"
#( [[ $(md5sum "$originalfilepath" | cut -b-32) = $(md5sum - | cut -b-32) ]] && : || echo -n '1' ) >"$differs"
( cmp -s "$originalfilepath" - && : || echo -n '1' ) >"$differs"
else
echo -n '1' >"$differs"
fi
}
create_or_replace(){
local originalfilepath="$1"
local newfilepath="$2" #Should be a pipe, but could be a regular file
local differs="$originalfilepath.differs"
local streamsize="$originalfilepath.size"
local timeoutseconds=30
local starttime
local stoptime
if [[ -f "$originalfilepath" ]]
then
#Cleanup
[[ -f "$differs" ]] && rm -f "$differs"
[[ -f "$streamsize" ]] && rm -f "$streamsize"
#cat the pipe, get its size, check for differences between the stream and the file and pipe the stream into the original file if all checks show a diff
cat "$newfilepath" |
tee >(wc -m - | cut -f1 -d' ' >"$streamsize") >(checkdiff "$originalfilepath" "$differs" "$streamsize" "$timeoutseconds") | {
#Hackish: we can't know for sure when the checkdiff subprocess will have produced the differs file
starttime=$(date +%s)
stoptime=$(( $starttime + $timeoutseconds ))
while ([[ ! -f "$differs" ]] && (( $stoptime > $(date +%s) ))); do :; done;
[[ ! -f "$differs" ]] || [[ ! -z $(cat "$differs" | head -1) ]] && cat - >"$originalfilepath"
}
#Cleanup
[[ -f "$differs" ]] && rm -f "$differs"
[[ -f "$streamsize" ]] && rm -f "$streamsize"
else
cat "$newfilepath" >"$originalfilepath"
fi
}

How do I enable bash tab complete from a different path?

I want to enable bash tab-complete to look for directories, but not in the current directory.
So for instance, if I do:
$ ls $P
dirs/ are/ here/
$ cd /not/the/P/path
$ ls
other/ stuff/
$ myProg <tab>
dirs/ are/ here
This changes the usual behavior, where I would normally see files in the current directory.
Due diligence: The best I could come up with is:
_myProg ()
{
local cur
COMPREPLY=()
cur=${COMP_WORDS[COMP_CWORD]}
if [ "${P}x" = "x" ]; then
return 1
fi
case "$cur" in
*)
pth=${P}/$( echo $cur | egrep -o "^.*/[^/]*$" )
COMPREPLY=( $( compgen -W "$( cd $pth && ls -1d "$cur"* 2>/dev/null -- "$cur" )" ) )
;;
esac
return 0
}
complete -o nospace -F _myProg myProg
which initially shows directories, but doesnt let me drill down into the directories how I want (like ls works).

Is $CDPATH helpful for you? See Advanced Bash Scripting Guide.

_myProg()
{
COMPREPLY=($(cd $P; compgen -f $2))
}
complete -onospace -F_myProg myProg

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Generate an arbitrary number of process substitutions as tee arguments - bash

Related

Simplest way to "correct" an accidental use of mv instead of an hg mv?

get the script path with whitespaces in Bash

Unix shell scripting, need to adjust my script for performance?

Comparing an existing file with the result of a heavy process using named pipes

How do I enable bash tab complete from a different path?

Categories

Resources