How to store NUL output of a program in bash script? - bash

Suppose there is a directory 'foo' which contains several files:
ls foo:
1.aa 2.bb 3.aa 4.cc
Now in a bash script, I want to count the number of files with specific suffix in 'foo', and display them, e.g.:
SUFF='aa'
FILES=`ls -1 *."$SUFF" foo`
COUNT=`echo $FILES | wc -l`
echo "$COUNT files have suffix $SUFF, they are: $FILES"
The problem is: if SUFF='dd', $COUNT also equal to 1. After google, the reason I found is when SUFF='dd', $FILES is an empty string, not really the null output of a program, which will be considered to have one line by wc. NUL output can only be passed through pipes. So one solution is:
COUNT=`ls -1 *."$SUFF" foo | wc -l`
but this will lead to the ls command being executed twice. So my question is: is there any more elegant way to achieve this?

$ shopt -s nullglob
$ FILES=(*)
$ echo "${#FILES[#]}"
4
$ FILES=(*aa)
$ echo "${#FILES[#]}"
2
$ FILES=(*dd)
$ echo "${#FILES[#]}"
0
$ SUFFIX=aa
$ FILES=(*"$SUFFIX")
$ echo "${#FILES[#]}"
2
$ SUFFIX=dd
$ FILES=(*"$SUFFIX")
$ echo "${#FILES[#]}"
0

you can also try this;
#!/bin/bash
SUFF='aa'
FILES=`ls -1 *."$SUFF" foo`
FILENAMES=`echo $FILES | awk -F ':' '{print $2}'`
COUNT=`echo $FILENAMES | wc -w`
echo "$COUNT files have suffix $SUFF, they are: $FILENAMES"
if inserted echo $FILES in your script, output is foo: 1.aa 2.aa 3.aa so
awk -F ':' '{print $2}' gets 1.aa 2.aa 3.aa from $FILES variable
wc -w prints the word counts

If you only need the file count, I would actually use find for that:
find '/path/to/directory' -mindepth 1 -maxdepth 1 -name '*.aa' -printf '\n' | wc -l
This is more reliable as it handles correctly filenames with line breaks. The way this works is that find outputs one empty line for each matching file.
Edit: If you want to keep the file list in an array, you can use a glob:
GLOBIGNORE=".:.."
shopt -s nullglob
FILES=(*aa)
COUNT=${#arr[#]}
echo "$COUNT"

The reason is that the option nullglob is unset by default in bash:
If no matching file names are found, and the shell option nullglob is not enabled, the word is left unchanged. If the nullglob option is set, and no matches are found, the word is removed.
So, just set the nullglob option, and run you code again:
shopt -s nullglob
SUFF='aa'
FILES="$(printf '%s\n' foo/*."$SUFF")"
COUNT="$(printf '%.0s\n' foo/*."$SUFF" | wc -l)"
echo "$COUNT files have suffix $SUFF, they are: $FILES"
Or better yet:
shopt -s nullglob
suff='aa'
files=( foo/*."$suff" )
count=${#file[#]}
echo "$count files have suffix $suff, they are: ${files[#]}"

Related

Bash command does not work in script but in console

I have running the two commands in a script where I want to check if all files in a directoy are media:
1 All_LINES=$(ls -1 | wc -l)
2 echo "Number of lines: ${All_LINES}"
3
4 REACHED_LINES=$(ls -1 *(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) | wc -l)
5 echo "Number of reached lines: ${REACHED_LINES}"
if[...]
Running line 4 and 5 sequentially in a shell it works as expected, counting all files ending with .jpg, .JPG...
Running all together in a script gives the following error though:
Number of lines: 12
/home/andreas/.bash_scripts/rnimgs: command substitution: line 17: syntax error near unexpected token `('
/home/andreas/.bash_scripts/rnimgs: command substitution: line 17: `ls -1 *(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) | wc -l)'
Number of reached lines:
Could somebody explain this to me, please?
EDIT: This is as far as I got:
#!/bin/bash
# script to rename images, sorted by "type" and "date modified" and named by current folder
#get/set basename for files
CURRENT_BASENAME=$(basename "${PWD}")
echo -e "Current directory/basename is: ${CURRENT_BASENAME}\n"
read -e -p "Please enter basename: " -i "${CURRENT_BASENAME}" BASENAME
echo -e "\nNew basename is: ${BASENAME}\n"
#start
echo -e "START RENAMING"
#get nr of all files in directory
All_LINES=$(ls -1 | wc -l)
echo "Number of lines: ${All_LINES}"
#get nr of media files in directory
REACHED_LINES=$(ls -1 *(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) | wc -l)
echo "Number of reached lines: ${REACHED_LINES}"
EDIT1: Thanks again guys, this is my result so far. Still room for improvement, but a start and ready to test.
#!/bin/bash
#script to rename media files to a choosable name (default: ${basename} of current directory) and sorted by date modified
#config
media_file_extensions="(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg)"
#enable option extglob (extended globbing): If set, the extended pattern matching features described above under Pathname Expansion are enabled.
#more info: https://askubuntu.com/questions/889744/what-is-the-purpose-of-shopt-s-extglob
#used for regex
shopt -s extglob
#safe and set IFS (The Internal Field Separator): IFS is used for word splitting after expansion and to split lines into words with the read builtin command.
#more info: https://bash.cyberciti.biz/guide/$IFS
#used to get blanks in filenames
SAVEIFS=$IFS;
IFS=$(echo -en "\n\b");
#get and print current directory
basedir=$PWD
echo "Directory:" $basedir
#get and print nr of files in current directory
all_files=( "$basedir"/* )
echo "Number of files in directory: ${#all_files[#]}"
#get and print nr of media files in current directory
media_files=( "$basedir"/*${media_file_extensions} )
echo -e "Number of media files in directory: ${#media_files[#]}\n"
#validation if #all_files = #media_files
if [ ${#all_files[#]} -ne ${#media_files[#]} ]
then
echo "ABORT - YOU DID NOT REACH ALL FILES, PLEASE CHECK YOUR FILE ENDINGS"
exit
fi
#make a copy
backup_dir="backup_95f528fd438ef6fa5dd38808cdb10f"
backup_path="${basedir}/${backup_dir}"
mkdir "${backup_path}"
rsync -r "${basedir}/" "${backup_path}" --exclude "${backup_dir}"
echo "BACKUP MADE"
echo -e "START RENAMING"
#set new basename
basename=$(basename "${PWD}")
read -e -p "Please enter file basename: " -i "$basename" basename
echo -e "New basename is: ${basename}\n"
#variables
counter=1;
new_name="";
file_extension="";
#iterate over files
for f in $(ls -1 -t -r *${media_file_extensions})
do
#catch file name
echo "Current file is: $f"
#catch file extension
file_extension="${f##*.}";
echo "Current file extension is: ${file_extension}"
#create new name
new_name="${basename}_${counter}.${file_extension}"
echo "New name is: ${new_name}";
#rename file
mv $f "${new_name}";
echo -e "Counter is: ${counter}\n"
((counter++))
done
#get and print nr of media files before
echo "Number of media files before: ${#media_files[#]}"
#get and print nr of media files after
media_files=( "$basedir"/*${media_file_extensions} )
echo -e "Number of media files after: ${#media_files[#]}\n"
#delete backup?
while true; do
read -p "Do you wish to keep the result? " yn
case $yn in
[Yy]* ) rm -r ${backup_path}; echo "BACKUP DELETED"; break ;;
[Nn]* ) rm -r !(${backup_dir}); rsync -r "${backup_path}/" "${basedir}"; rm -r ${backup_path}; echo "BACKUP RESTORED THEN DELETED"; break;;
* ) echo "Please answer yes or no.";;
esac
done
#reverse IFS to default
IFS=$SAVEIFS;
echo -e "END RENAMING"
You don't need to and don't want to use ls at all here. See https://mywiki.wooledge.org/ParsingLs
Also, don't use uppercase for your private variables. See Correct Bash and shell script variable capitalization
#!/bin/bash
shopt -s extglob
read -e -p "Please enter basename: " -i "$PWD" basedir
all_files=( "$basedir"/* )
echo "Number of files: ${#all_files[#]}"
media_files=( "$basedir"/*(*.JPG|*.jpg|*.PNG|*.png|*.JPEG|*.jpeg) )
echo "Number of media files: ${#media_files[#]}"
As #chepner already pointed out in a comment, you likely need to explicitly enable extended globbing on your script. c.f. Greg's WIKI
It's also possible to condense that pattern to eliminate some redundancy and add mixed case if you like -
$: ls -1 *.*([Jj][Pp]*([Ee])[Gg]|[Pp][Nn][Gg])
a.jpg
b.JPG
c.jpeg
d.JPEG
mixed.jPeG
mixed.pNg
x.png
y.PNG
You can also accomplish this without ls, which is error-prone. Try this:
$: all_lines=(*)
$: echo ${#all_lines[#]}
55
$: reached_lines=( *.*([Jj][Pp]*([Ee])[Gg]|[Pp][Nn][Gg]) )
$: echo ${#reached_lines[#]}
8
c.f. this breakdown
If all you want is counts, but prefer not to include directories:
all_dirs=( */ )
num_files=$(( ${#all_files[#]} - ${#all_dirs[#]} ))
If there's a chance you will have a directory with a name that matches your jpg/png pattern, then it gets trickier. At that point it's probably easier to just use #markp-fuso's solution.
One last thing - avoid all-caps variable names. Those are generally reserved for system stuff.
Assuming the OP wants to limit the counts to normal files (ie, exclude non-files like directories, pipes, symbolic links, etc), a solution based on find may provide more accurate counts.
Updating OP's original code to use find (ignoring dot files for now):
ALL_LINES=$(find . -maxdepth 1 -type f | wc -l)
echo "Number of lines: ${ALL_LINES}"
REACHED_LINES=$(find . -maxdepth 1 -type f \( -iname '*.jpg' -o -iname '*.png' -o -iname '*.jpeg' \) | wc -l)
echo "Number of reached lines: ${REACHED_LINES}"

Why is [ "$(ls *.jpg | wc -l)" = 0 ] always false unless the quotes are removed?

Given that there are no jpg files in my current working directory, why do the two scripts jpg1.sh and jpg2.sh give different results? How to understand the difference? Note that the only difference between the scripts is whether to have double quotes around the $() command substitution.
$ cat jpg1.sh
#!/bin/bash
if [ "$(ls *.jpg | wc -l)" = 0 ]; then
echo "yes"
else
echo "no"
fi
$ ./jpg1.sh
ls: *.jpg: No such file or directory
no
$ cat jpg2.sh
#!/bin/bash
if [ $(ls *.jpg | wc -l) = 0 ]; then
echo "yes"
else
echo "no"
fi
$ ./jpg2.sh
ls: *.jpg: No such file or directory
yes
Follow-up
I testify that the ticked answer is to the point -- wc -l has some extra white spaces in its return value. After adding set -x to both scripts, the difference surfaces for itself.
$ ./jpg1.sh
++ wc -l
++ ls '*.jpg'
ls: *.jpg: No such file or directory
+ '[' ' 0' = 0 ']'
+ echo no
no
$ ./jpg2.sh
++ wc -l
++ ls '*.jpg'
ls: *.jpg: No such file or directory
+ '[' 0 = 0 ']'
+ echo yes
yes
By the way my system is macOS Catalina.
On some systems this is false because wc -l pads its answer with whitespace, and the string you're comparing against -- 0 -- doesn't contain any whitespace at all. When you use quotes, the exact output is compared; when you leave them off, the output is split into individual words based on whitespace, and each of those words is expanded as a glob, before it's put on the [ command line. To determine whether you're on such a system, add set -x to your script to enable trace-level logging so you can see the exact values [ is being asked to compare.
In present circumstances, the entire problem is trivially avoided: There's no reason to use either ls or wc for this purpose.
#!/usr/bin/env bash
shopt -s nullglob
jpegs=( *.jpg )
if (( ${#jpegs[#]} == 0 )); then
echo "No JPEG files were found"
else
echo "Exactly ${#jpegs[#]} JPEG files were found"
fi

Using bash to iterate through similarly named files and grep

I have a list of base files:
file1.txt
file2.txt
file3.txt
and a list of target files:
target1.txt
target2.txt
target3.txt
and I want to use bash to perform the following command using a loop:
grep -wf "file1.txt" "target1.txt" > "result1.txt"
grep -wf "file2.txt" "target2.txt" > "result2.txt"
The files will all have the same name besides the final integer, which will be in a series (1:22).
With a for loop:
for((i=1; i<=22; i++)); do
grep -wf "file$i.txt" "target$i.txt" > "result$i.txt"
done
With arbitrary number of file#.txt and target#.txt:
#!/usr/bin/env bash
shopt -s extglob # Enable extended globbing patterns
# Iterate all file#.txt
for f in file+([[:digit:]]).txt; do
# Extract the index from the file name by stripping-out all non digit characters
i="${f//[^[:digit:]]//}"
file="$f"
target="target$i.txt"
result="result$i.txt"
# If both file#.txt and target#.txt exists
if [ -e "$file" ] && [ -e "$target" ]; then
grep -wf "$file" "$target" >"$result"
fi
done
This is a one-line version suitable for the command line with brace expanion:
for i in {1..22};do grep -wf "file$i.txt" "target$i.txt" > "result$i.txt"; done
Do them all in parallel with GNU Parallel:
parallel 'grep -wf file{}.txt target{}.txt > result{}.txt' ::: {1..22}

Grep multiple occurrences given two strings and two integers

im looking for a bash script to count the occurences of a word in a given directory and it's subdirectory's files with this pattern:
^str1{n}str2{m}$
for example:
str1= yo
str2= uf
n= 3
m= 4
the match would be "yoyoyoufufufuf"
but i'm having trouble with grep
that's what i have tried
for file in $(find $dir)
do
if [ -f $file ]; then
echo "<$file>:<`grep '\<\$str1\{$n\}\$str2\{$m\}\>'' $file | wc -l >" >> a.txt
fi
done
should i use find?
#Barmar's comment is useful.
If I understand your question, I think this single grep command should do what you're looking for:
grep -r -c "^\($str1\)\{$n\}\($str2\)\{$m\}$" "$dir"
Note the combination of -r and -c causes grep to output zero-counts for non-matching files. You can pipe to grep -v ":0$" to suppress this output if you require:
$ dir=.
$ str1=yo
$ str2=uf
$ n=3
$ m=4
$ cat youf
yoyoyoufufufuf
$ grep -r -c "^\($str1\)\{$n\}\($str2\)\{$m\}$" "$dir"
./noyouf:0
./youf:1
./dir/youf:1
$ grep -r -c "^\($str1\)\{$n\}\($str2\)\{$m\}$" "$dir" | grep -v ":0$"
./youf:1
./dir/youf:1
$
Note also $str1 and $str2 need to be put in parentheses so that the {m} and {n} apply to everything within the parentheses and not just the last character.
Note the escaping of the () and {} as we require double-quotes ", so that the variables are expanded into the grep regular expression.

preventing wildcard expansion in bash script

I've searched here, but still can't find the answer to my globbing problems.
We have files "file.1" through "file.5", and each one should contain the string "completed" if our overnight processing went ok.
I figure it's a good thing to first check that there are some files, then I want to grep them to see if I find 5 "completed" strings. The following innocent approach doesn't work:
FILES="/mydir/file.*"
if [ -f "$FILES" ]; then
COUNT=`grep completed $FILES`
if [ $COUNT -eq 5 ]; then
echo "found 5"
else
echo "no files?"
fi
Thanks for any advice....Lyle
Per http://mywiki.wooledge.org/BashFAQ/004, the best approach to counting files is to use an array (with the nullglob option set):
shopt -s nullglob
files=( /mydir/files.* )
count=${#files[#]}
If you want to collect the names of those files, you can do it like so (assuming GNU grep):
completed_files=()
while IFS='' read -r -d '' filename; do
completed_files+=( "$filename" )
done < <(grep -l -Z completed /dev/null files.*)
(( ${#completed_files[#]} == 5 )) && echo "Exactly 5 files completed"
This approach is somewhat verbose, but guaranteed to work even with highly unusual filenames.
try this:
[[ $(grep -l 'completed' /mydir/file.* | grep -c .) == 5 ]] || echo "Something is wrong"
will print "Something is wrong" if doesn't find 5 completed lines.
Corrected the missing "-l" - the explanation
$ grep -c completed file.*
file.1:1
file.2:1
file.3:0
$ grep -l completed file.*
file.1
file.2
$ grep -l completed file.* | grep -c .
2
$ grep -l completed file.* | wc -l
2
You can do this to prevent globbing:
echo \'$FILES\'
but it seems you have a different problem

Resources