Correct way of quoting command substitution - bash

I have simple bash script which only outputs the filenames that are given to the script as positional arguments:
#!/usr/bin/env bash
for file; do
echo "$file"
done
Say I have files with spaces (say "f 1" and "f 2"). I can call the script with a wildcard and get the expected output:
$ ./script f*
> f 1
> f 2
But if I use command substitution it doesn't work:
$ ./script $(echo f*)
> f
> 1
> f
> 2
How can I get the quoting right when my command substition outputs multiple filenames with spaces?
Edit: What I ultimatively want is to pass filenames to a script (that is slightly more elaborate than just echoing their names) in a random order, e.g. something like that:
./script $(ls f* | shuf)

With GNU shuf and Bash 4.3+:
readarray -d '' files < <(shuf --zero-terminated --echo f*)
./script "${files[#]}"
where the --zero-terminated can handle any filenames, and readarray also uses the null byte as the delimiter.
With older Bash where readarray doesn't support the -d option:
while IFS= read -r -d '' f; do
files+=("$f")
done < <(shuf --zero-terminated --echo f*)
./script "${files[#]}"
In extreme cases with many files, this might run into command line length limitations; in that case,
shuf --zero-terminated --echo f*
could be replaced by
printf '%s\0' f* | shuf --zero-terminated
Hat tip to Socowi for pointing out --echo.

It's very difficult to get this completely correct. A simple attempt would be to use %q specifier to printf, but I believe that is a bashism. You still need to use eval, though. eg:
$ cat a.sh
#!/bin/sh
for x; do echo $((i++)): "$x"; done
$ ./a.sh *
0: a.sh
1: name
with
newlines
2: name with spaces
$ eval ./a.sh $(printf "%q " *)
0: a.sh
1: name
with
newlines
2: name with spaces

This feels like an XY Problem. Maybe you should explain the real problem, someone might have a much better solution.
Nonetheless, working with what you posted, I'd say read this page on why you shouldn't try to parse ls as it has relevant points; then I suggest an array.
lst=(f*)
./script "${lst[#]}"
This will still fail if you reparse it as the output of a subshell, though -
./script $( echo "${lst[#]}" ) # still fails same way
./script "$( echo "${lst[#]}" )" # *still* fails same way
Thinking about how we could make it work...

You can use xargs:
$ ls -l
total 4
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 1'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 2'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 3'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 4'
-rwxr-xr-x 1 root root 35 2021-08-13 00:25 script
$ ./script *file*
file 1
file 2
file 3
file 4
$ ls *file* | shuf | xargs -d '\n' ./script
file 4
file 2
file 1
file 3
If your xargs does not support -d:
$ ls *file* | shuf | tr '\n' '\0' | xargs -0 ./script
file 3
file 1
file 4
file 2

Related

sort by name on bash same as graphical on windows

I have this folder in windows
if I do a simple ls , find, either in bash (cygwin) or msdos, it shows me like this.
$ ls -1
su-01-01.jpg
su-01-02-03.jpg
su-01-12-13.jpg
su-01-14.jpg
su-01-15.jpg
su-01-16.jpg
su-01-18.jpg
su-01-19.jpg
su-01-20.jpg
su-01-21.jpg
su-01-31.jpg
su-01-34.jpg
su-01-35.jpg
su-01-38.jpg
su-01-39.jpg
su-01-42-43.jpg
su-01-44.jpg
su-01-45.jpg
su-01-47.jpg
su-01-48.jpg
su01-00.jpg
su01-04.jpg
su01-05.jpg
su01-06.jpg
su01-07.jpg
su01-08.jpg
I have tried ordering and it does not take into account 0 00 1
$ ls -1 |sort -V
su01-00.jpg
su01-04.jpg
su01-05.jpg
su01-06.jpg
su01-07.jpg
su01-08.jpg
su01-09.jpg
su01-10.jpg
su01-11.jpg
su01-22-23.jpg
su01-24.jpg
su01-25.jpg
su01-26.jpg
su01-27.jpg
su01-28-29.jpg
su01-30.jpg
su01-32.jpg
su01-33.jpg
su01-40-41.jpg
su-01-01.jpg
su-01-02-03.jpg
su-01-12-13.jpg
su-01-14.jpg
su-01-15.jpg
but how do I make it ignore the (-)?
thank you very much for your help
find doesn't guaranty alphabetical ordering; ls and sort do, but the char - value is 45 while the 0 char value is 48, so su- will come ahead of the su0 in an alphabetical sorting.
While a printf '%s\n' su* | LANG=en_US.utf8 sort -n seems to display the files the way you want, the best thing to do for making your life easier would be to rename some of the files:
#!/bin/bash
for f in su0*
do
mv "$f" "su-0${f#su0}"
done
Update
renaming the files to 001.jpg 002.jpg ...
#!/bin/bash
shopt -s nullglob
n=1
while IFS='' read -r file
do
printf -v newname '%03d.%s' "$((n++))" "${file##*.}"
printf '%q %q %q\n' mv "$file" "$newname"
done < <(
printf '%s\n' su* |
sed -nE 's,su-?([^/]*)$,\1/&,p' |
LANG=C sort -nt '-' |
sed 's,[^/]*/,,'
)
The simplest way to control the sort order in Bash, both for ls and sort, so to set your LANG variable to the locale you want.
In your .bashrc or .profile, add
export LANG=en_US.utf8
and then
ls -1
or
ls -1 | sort
will output the order you're looking for.
If you want to test with different locales and see their effect, your can set LANG one command at a time. For example, compare the output of these commands:
LANG=en_US.utf8 ls -1 # what you're looking for
LANG=C ls -1 # "ASCIIbetic" order
LANG=fr_FR.utf8 ls -1 # would consider é as between e and f

how to group all arguments as position argument for `xargs`

I have a script which takes in only one positional parameter which is a list of values, and I'm trying to get the parameter from stdin with xargs.
However by default, xargs passes all the lists to my script as positional parameters, e.g. when doing:
echo 1 2 3 | xargs myScript, it will essentially be myScript 1 2 3, and what I'm looking for is myScript "1 2 3". What is the best way to achieve this?
Change the delimiter.
$ echo 1 2 3 | xargs -d '\n' printf '%s\n'
1 2 3
Not all xargs implementations have -d though.
And not sure if there is an actual use case for this but you can also resort to spawning another shell instance if you have to. Like
$ echo -e '1 2\n3' | xargs sh -c 'printf '\''%s\n'\'' "$*"' sh
1 2 3
If the input can be altered, you can do this. But not sure if this is what you wanted.
echo \"1 2 3\"|xargs ./myScript
Here is the example.
$ cat myScript
#!/bin/bash
echo $1; shift
echo $1; shift
echo $1;
$ echo \"1 2 3\"|xargs ./myScript
1 2 3
$ echo 1 2 3|xargs ./myScript
1
2
3

Separate a list of files by keyword group, then sort each group by reverse date

I use this:
for f in $( ls -tr ${repSource}*.txt );
...to loop on a list of files, sorted oldest to newest.
I want to add another sort "filter": filenames that don't start with "abc" come first, whatever their timestamp.
So if the files are "abc.txt", "def.txt" and "ghi.txt", then "abc.txt" must be last, and the other two come before in the list (sorted by reverse date).
You don't want to parse the output of the ls command. The construct in your question is a classic bash pitfall and parsing ls is well known to be problematic.
You can instead use stat to get the time of each file that you find in a for loop on a pattern expansion.
#!/usr/bin/env bash
uname_s=$(uname -s)
stamp() {
case "$uname_s" in
Linux) stat -c '%Y' "$1" ;;
Darwin|*BSD) stat -f '%m' "$1" ;;
esac
}
inputfiles=( "$#" )
for file in "${inputfiles[#]}"; do
n=$(stamp "$file")
while [ -n "${forward[$n]}" ]; do
((n++)) # introduce fudge to avoid timestamp collissions
done
forward[$n]="$file"
done
declare -p inputfiles
for n in "${!forward[#]}"; do
reverse[$(( 2**32 - $n ))]="${forward[$n]}"
done
declare -p forward reverse
This example script takes a list of files as command line options (which can be a glob), then uses declare -p to show you the original list, the forward-sorted list, and a reverse-sorted list.
The case statement in the stamp function makes it portable between Linux, OS X (Darwin), FreeBSD, NetBSD, etc. since I don't know what operating system you're using. (If it's something less common like Solaris, HP/UX, etc, then the stat command may not be available or useful and this solution might not work.)
Once you have a sorted (non-associative) array in bash, you can process the files one by one with constructs like:
for file in "${forward[#]}"; do
# something to $file
done
or
for file in "${reverse[#]}"; do
# something to $file
done
And you can trust that since non-associative bash arrays are always numerically ordered, you'll get the files in their date order.
And of course, you've got the dates themselves as indexes, if you want. :)
The files:
ls -log
total 12
-rw-rw-r-- 1 4 Apr 8 11:15 abc.txt
-rw-rw-r-- 1 4 Apr 8 11:15 def.txt
-rw-rw-r-- 1 4 Apr 8 11:16 ghi.txt
This seems to work, but there must be a better way:
ls -log --time-style +%s | tail -n +2 | \
pee \
'grep -v abc.txt | sort -k4 -rn' \
'grep abc.txt | sort -k4 -rn' | \
cut -d " " -f 5
Output:
ghi.txt
def.txt
abc.txt
Note: the unseemly named util pee is from debian's moreutils package, it's a "like tee for pipes".
Now to make a loop of it:
# usage: foo filename # filter above sort code by filename
foo() { ls -log --time-style +%s | tail -n +2 | \
pee 'grep -v '"$1"' | sort -k4 -rn' \
'grep '"$1"' | sort -k4 -rn' | \
cut -d " " -f 5 ; }
for f in $( foo abc.txt )

bash: process substitution, paste and echo

I'm trying out process substitution and this is just a fun exercise.
I want to append the string "XXX" to all the values of 'ls':
paste -d ' ' <(ls -1) <(echo "XXX")
How come this does not work? XXX is not appended. However if I want to append the file name to itself such as
paste -d ' ' <(ls -1) <(ls -1)
it works.
I do not understand the behavior. Both echo and ls -1 write to stdout but echo's output isn't read by paste.
Try doing this, using a printf hack to display the file with zero length output and XXX appended.
paste -d ' ' <(ls -1) <(printf "%.0sXXX\n" * )
Demo :
$ ls -1
filename1
filename10
filename2
filename3
filename4
filename5
filename6
filename7
filename8
filename9
Output :
filename1 XXX
filename10 XXX
filename2 XXX
filename3 XXX
filename4 XXX
filename5 XXX
filename6 XXX
filename7 XXX
filename8 XXX
filename9 XXX
If you just want to append XXX, this one will be simpler :
printf "%sXXX\n"
If you want the XXX after every line of ls -l output, you need a second command that output x times the string. You are echoing it just once and therefore it will get appended to the first line of ls output only.
If you are searching for a tiny command line to achieve the task you may use sed:
ls -l | sed -n 's/\(^.*\)$/\1 XXX/p'
And here's a funny one, not using any external command except the legendary yes command!
while read -u 4 head && read -u 5 tail ; do echo "$head $tail"; done 4< <(ls -1) 5< <(yes XXX)
(I'm only posting this because it's funny and it's actually not 100% off topic since it uses file descriptors and process substitutions)
... you have to:
for i in $( ls -1 ); do echo "$i XXXX"; done
Never use for i in $(command). See this answer for more details.
So, to answer of this original question, you could simply use something like this :
for file in *; do echo "$file XXXX"; done
Another solution with awk :
ls -1|awk '{print $0" XXXX"}'
awk '{print $0" XXXX"}' <(ls -1) # with process substitution
Another solution with sed :
ls -1|sed "s/\(.*\)/\1 XXXX/g"
sed "s/\(.*\)/\1 XXXX/g" <(ls -1) # with process substitution
And useless solutions, just for fun :
while read; do echo "$REPLY XXXX"; done <<< "$(ls -1)"
ls -1|while read; do echo "$REPLY XXXX"; done
It does it only for the first line, since it groups the first line from parameter 1 with the first line from parameter 2:
paste -d ' ' <(ls -1) <(echo "XXX")
... outputs:
/dir/file-a XXXX
/dir/file-b
/dir/file-c
... you have to:
for i in $( ls -1 ); do echo "$i XXXX"; done
You can use xargs for the same effect:
ls -1 | xargs -I{} echo {} XXX

Different pipeline behavior between sh and ksh

I have isolated the problem to the below code snippet:
Notice below that null string gets assigned to LATEST_FILE_NAME='' when the script is run using ksh; but the script assigns the value to variable $LATEST_FILE_NAME correctly when run using sh. This in turn affects the value of $FILE_LIST_COUNT.
But as the script is in KornShell (ksh), I am not sure what might be causing the issue.
When I comment out the tee command in the below line, the ksh script works fine and correctly assigns the value to variable $LATEST_FILE_NAME.
(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH
Kindly consider:
1. Source Code: script.sh
#!/usr/bin/ksh
set -vx # Enable debugging
SCRIPTLOGSDIR=/some/path/Scripts/TEST/shell_issue
SOURCE_FILE_PATH=/some/path/Scripts/TEST/shell_issue
# Log file
Timestamp=`date +%Y%m%d%H%M`
LOG_FILENAME="TEST_LOGS_${Timestamp}.log"
LOG_FILE_PATH="${SCRIPTLOGSDIR}/${LOG_FILENAME}"
## Temporary files
FILE_LIST=FILE_LIST.temp #Will store all extract filenames
FILE_LIST_COUNT=0 # Stores total number of files
getFileListDetails(){
rm -f $SOURCE_FILE_PATH/$FILE_LIST 2>&1 | tee -a $LOG_FILE_PATH
# Get list of all files, Sort in reverse order, and store names of the files line-wise. If no files are found, error is muted.
(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH
if [[ ! -f $SOURCE_FILE_PATH/$FILE_LIST ]]; then
echo "FATAL ERROR - Could not create a temp file for file list.";exit 1;
fi
LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)";
FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)";
}
getFileListDetails;
exit 0;
2. Output when using shell sh script.sh:
+ getFileListDetails
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ cd /some/path/Scripts/TEST/shell_issue
+ sort -r
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ ls 1.txt 2.txt 3.txt
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
cd $SOURCE_FILE_PATH; head -1 $FILE_LIST
++ cd /some/path/Scripts/TEST/shell_issue
++ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=3.txt
cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l
++ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
++ wc -l
+ FILE_LIST_COUNT=3
exit 0;
+ exit 0
3. Output when using ksh ksh script.sh:
+ getFileListDetails
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ 2>& 1
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ sort -r
+ 1> /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ cd /some/path/Scripts/TEST/shell_issue
+ ls 1.txt 2.txt 3.txt
+ 2> /dev/null
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
+ cd /some/path/Scripts/TEST/shell_issue
+ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=''
+ wc -l
+ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ FILE_LIST_COUNT=0
exit 0;+ exit 0
OK, here goes...this is a tricky and subtle one. The answer lies in how pipelines are implemented. POSIX states that
If the pipeline is not in the background (see Asynchronous Lists), the shell shall wait for the last command specified in the pipeline to complete, and may also wait for all commands to complete.)
Notice the keyword may. Many shells implement this in a way that all commands need to complete, e.g. see the bash manpage:
The shell waits for all commands in the pipeline to terminate before returning a value.
Notice the wording in the ksh manpage:
Each command, except possibly the last, is run as a separate process; the shell waits for the last command to terminate.
In your example, the last command is the tee command. Since there is no input to tee because you redirect stdout to ${SOURCE_FILE_PATH}/${FILE_LIST} in the command before, it immediately exits. Oversimplified speaking, the tee is faster than the earlier redirection, which means that your file is probably not finished writing to by the time you are reading from it. You can test this (this is not a fix!) by adding a sleep at the end of the whole command:
$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[]
$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; sleep 0.1; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]
$ bash -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]
That being said, here are a few other things to consider:
Always quote your variables, especially when dealing with files, to avoid problems with globbing, word splitting (if your path contains spaces) etc.:
do_something "${this_is_my_file}"
head -1 is deprecated, use head -n 1
If you only have one command on a line, the ending semicolon ; is superfluous...just skip it
LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)"
No need to cd into the directory first, just specify the whole path as argument to head:
LATEST_FILE_NAME="$(head -n 1 "${SOURCE_FILE_PATH}/${FILE_LIST}")"
FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)"
This is called Useless Use Of Cat because the cat is not needed - wc can deal with files. You probably used it because the output of wc -l myfile includes the filename, but you can use e.g. FILE_LIST_COUNT="$(wc -l < "${SOURCE_FILE_PATH}/${FILE_LIST}")" instead.
Furthermore, you will want to read Why you shouldn't parse the output of ls(1) and How can I get the newest (or oldest) file from a directory?.

Resources