How to get list of certain strings in a list of files using bash? - bash

The title is maybe not really descriptive, but I couldn't find a more concise way to describe the problem.
I have a directory containing different files which have a name that e.g. looks like this:
{some text}2019Q2{some text}.pdf
So the filenames have somewhere in the name a year followed by a capital Q and then another number. The other text can be anything, but it won't contain anything matching the format year-Q-number. There will also be no numbers directly before or after this format.
I can work something out to get this from one filename, but I actually need a 'list' so I can do a for-loop over this in bash.
So, if my directory contains the files:
I want a for loop that goes over 2019Q2, 2019Q3, 2020Q1, and 2020Q2.
This is what I have so far. It is able to extract the substrings, but it still has doubles. Since I'm already in the loop and I don't see how I can remove the doubles.
find original/*.pdf -type f -print0 | while IFS= read -r -d '' line; do
echo $line | grep -oP '[0-9]{4}Q[0-9]'

# list all _filanames_ that end with .pdf from the folder original
find original -maxdepth 1 -name '*.pdf' -type f -print "%p\n" |
# extract the pattern
sed 's/.*\([0-9]{4}Q[0-9]\).*/\1/' |
# iterate
while IFS= read -r file; do
echo "$file"
I used -print %p to print just the filename, instead of full path. The GNU sed has -z option that you can use with -print0 (or -print "%p\0").
With how you have wanted to do this, if your files have no newline in the name, there is no need to loop over list in bash (as a rule of a thumb, try to avoid while read line, it's very slow):
find original -maxdepth 1 -name '*.pdf' -type f | grep -oP '[0-9]{4}Q[0-9]'
or with a zero seprated stream:
find original -maxdepth 1 -name '*.pdf' -type f -print0 |
grep -zoP '[0-9]{4}Q[0-9]' | tr '\0' '\n'
If you want to remove duplicate elements from the list, pipe it to sort -u.

Try this, in bash:
~ > $ ls
costumerA_2019Q2_something.pdf costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf other.pdf
costumerA_2020Q1_something.pdf someother.file.txt
~ > $ for x in `(ls)`; do [[ ${x} =~ [0-9]Q[1-4] ]] && echo $x; done;
~ > $ (for x in *; do [[ ${x} =~ ([0-9]{4}Q[1-4]).+pdf ]] && echo ${BASH_REMATCH[1]}; done;) | sort -u


bash iterate over a directory sorted by file size

As a webmaster, I generate a lot of junk files of code. Periodically I have to purge the unneeded files filtered by extention. Example: "cleaner txt" Easy enough. But I want to sort the files by size and process them for the "for" loop. How can I do that?
if [ -z "$1" ]; then
echo "Please supply the filename suffixes to delete.";
for FILE in *.$filter; do clear;
cat $FILE; printf '\n\n'; rm -i $FILE; done
You can use a mix of find (to print file sizes and names), sort (to sort the output of find) and cut (to remove the sizes). In case you have very unusual file names containing any possible character including newlines, it is safer to separate the files by a character that cannot be part of a name: NUL.
if [ -z "$1" ]; then
echo "Please supply the filename suffixes to delete.";
while IFS= read -r -d '' -u 3 FILE; do
cat "$FILE"
printf '\n\n'
rm -i "$FILE"
done 3< <(find . -mindepth 1 -maxdepth 1 -type f -name "*.$filter" \
-printf '%s\t%p\0' | sort -zn | cut -zf 2-)
Note that we must use a different file descriptor than stdin (3 in this example) to pass the file names to the loop. Else, if we use stdin, it will also be used to provide the answers to rm -i.
Inspired from this answer, you could use the find command as follows:
find ./ -type f -name "*.yaml" -printf "%s %p\n" | sort -n
find command prints the the size of the files and the path so that the sort command prints the results from the smaller one to the larger.
In case you want to iterate through (let's say) the 5 bigger files you can do something like this using the tail command like this:
for f in $(find ./ -type f -name "*.yaml" -printf "%s %p\n" |
sort -n |
cut -d ' ' -f 2)
echo "### $f"
If the file names don't contain newlines and spaces
while read filesize filename; do
printf "%-25s has size %10d\n" "$filename" "$filesize"
done < <(du -bs *."$filter"|sort -n)
while read filename; do
echo "$filename"
done < <(du -bs *."$filter"|sort -n|awk '{$0=$2}1')

Bash Script to Prepend a Single Random Character to All Files In a Folder

I have an audio sample library with thousands of files. I would like to shuffle/randomize the order of these files. Can someone provide me with a bash script/line that would prepend a single random character to all files in a folder (including files in sub-folders). I do not want to prepend a random character to any of the folder names though.
Kickdrum SUB.wav
Renamed to:
!_Kickdrum SUB.wav
If possible, I would like to be able to run this script more than once, but on subsequent runs, it just changes the randomly prepended character instead of prepending a new one.
Some of my attempts:
find ~/Desktop/test -type f -print0 | xargs -0 -n1 bash -c 'mv "$0" "a${0}"'
find ~/Desktop/test/ -type f -exec mv -v {} $(cat a {}) \;
find ~/Desktop/test/ -type f -exec echo -e "Z\n$(cat !)" > !Hat 15.wav
for file in *; do
mv -v "$file" $RANDOM_"$file"
Note: I am running on macOS.
Latest attempt using code from mr. fixit:
find . -type f -maxdepth 999 -not -name ".*" |
cut -c 3- - |
while read F; do
if [ $randomCharacter == '_' ]; then
fileName="`basename $new`"
newFilename="`jot -r -c $fileName 1 A Z`"
filePath="`dirname $new`"
mv -v "$F" "$newFilePath"
Here's my first answer, enhanced to do sub-directories.
Put the following in file randomize
if [[ $# != 1 || ! -d "$1" ]]; then
echo "usage: $0 <path>"
find $1 -type f -not -name ".*" |
while read F; do
FDIR=`dirname "$F"`
FNAME=`basename "$F"`
if [ $char2 == '_' ]; then
new=`jot -r -w "%c$new" 1 A Z`
echo mv "$F" "${FDIR}/${new}"
Set the permissions with chmod a+x randomize.
Then call it with randomize your/path.
It'll echo the commands required to rename everything, so you can examine them to ensure they'll work for you. If they look right, you can remove the echo from the 3rd to last line and rerun the script.
cd ~/Desktop/test, then
find . -type f -maxdepth 1 -not -name ".*" |
cut -c 3- - |
while read F; do
if [ $char2 == '_' ]; then
new=`jot -r -w "%c$new" 1 A Z`
mv "$F" "$new"
find . -type f -maxdepth 1 -not -name ".*" will get all the files in the current directory, but not the hidden files (names starting with '.')
cut -c 3- - will strip the first 2 chars from the name. find outputs paths, and the ./ gets in the way of processing prefixes.
while read VAR; do <stuff>; done is a way to deal with one line at a time
char2="${VAR:2:1} sets a variable char2 to the 2nd character of the variable VAR.
if - then - else sets new to the filename, either preceded by _ or with the previous random character stripped off.
jot -r -w "%c$new" 1 A Z tacks random 1 character from A-Z onto the beginning of new
mv old new renames the file
You can also do it all in bash and there are several ways to approach it. The first is simply creating an array of letters containing whatever letters you want to use as a prefix and then generating a random number to use to choose the element of the array, e.g.
letters=({0..9} {A..Z} {a..z}) ## array with [0-9] [A-Z] [a-z]
for i in *; do
num=$(($RANDOM % 63)) ## generate number
## remove echo to actually move file
echo "mv \"$i\" \"${letters[num]}_$i\"" ## move file
Example Use/Output
Current the script outputs the changes it would make, you must remove the echo "..." surrounding the mv command and fix the escaped quotes to actually have it apply changes:
$ bash ../
mv "Kick808.mp3" "4_Kick808.mp3"
mv "Kickdrum SUB.wav" "h_Kickdrum SUB.wav"
mv "Kickdrum73.wav" "l_Kickdrum73.wav"
You can also do it by generating a random number representing the ASCII character between 48 (character '0') through 126 (character '~'), excluding 'backtick'), and then converting the random number to an ASCII character and prefix the filename with it, e.g.
for i in *; do
num=$((($RANDOM % 78) + 48)) ## generate number for '0' - '~'
letter=$(printf "\\$(printf '%03o' "$num")") ## letter from number
while [ "$letter" = '`' ]; do ## exclude '`'
num=$((($RANDOM % 78) + 48)) ## generate number
letter=$(printf "\\$(printf '%03o' "$num")")
## remove echo to actually move file
echo "mv \"$i\" \"${letter}_$i\"" ## move file
(similar output, all punctuation other than backtick is possible)
In each case you will want to place the script in your path or call it from within the directory you want to move the file in (you split split dirname and basename and join them back together to make the script callable passing the directory to search as an argument -- that is left to you)

Looping over filtered find and performing an operation

I have a garbage dump of a bunch of Wordpress files and I'm trying to convert them all to Markdown.
The script I wrote is:
htmlDocs=($(find . -print | grep -i '.*[.]html'))
for html in "${htmlDocs[#]}"
echo "${html} \> ${P_MD}"
pandoc --ignore-args -r html -w markdown < "${html}" | awk 'NR > 130' | sed '/<div class="site-info">/,$d' > "${P_MD}"
As far as I understand, the first line should be making an array of all html files in all subdirectories, then the for loop has a line to create a variable with the Markdown name (followed by a debugging echo), then the actual pandoc command to do the conversion.
One at a time, this command works.
However, when I try to execute it, OSX gives me:
$ ./pandoc_convert.command
./pandoc_convert.command: line 1: : No such file or directory
./pandoc_convert.command: line 1: : No such file or directory
There may be many reasons why the script fails, because the way you create the array is incorrect:
htmlDocs=($(find . -print | grep -i '.*[.]html'))
Arrays are assigned in the form: NAME=(VALUE1 VALUE2 ... ), where NAME is the name of the variable, VALUE1, VALUE2, and the rest are fields separated with characters that are present in the $IFS (input field separator) variable. Suppose you find a file name with spaces. Then the expression will create separate items in the array.
Another issue is that the expression doesn't handle globbing, i.e. file name generation based on the shell expansion of special characters such as *:
mkdir dir.html
touch \ *.html
touch a\ b\ c.html
a=($(find . -print | grep -i '.*[.]html'))
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
>>>a b c.html<<<
>>> *.html<<<
I know two ways to fix this behavior: 1) temporarily disable globbing, and 2) use the mapfile command.
Disabling Globbing
# Disable globbing, remember current -f flag value
[[ "$-" == *f* ]] || globbing_disabled=1
set -f
IFS=$'\n' a=($(find . -print | grep -i '.*[.]html'))
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
# Restore globbing
test -n "$globbing_disabled" && set +f
>>>./ .html<<<
>>>./a b c.html<<<
>>>./ *.html<<<
Using mapfile
The mapfile is introduced in Bash 4. The command reads lines from the standard input into an indexed array:
mapfile -t a < <(find . -print | grep -i '.*[.]html')
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
The find Options
The find command selects all types of nodes, including directories. You should use the -type option, e.g. -type f for files.
If you want to filter the result set with a regular expression use -regex option, or -iregex for case-insensitive matching:
mapfile -t a < <(find . -type f -iregex .*\.html$)
for html in "${a[#]}"; do echo ">>>${html}<<<"; done
>>>./ .html<<<
>>>./a b c.html<<<
>>>./ *.html<<<
echo vs. printf
Finally, don't use echo in new software. Use printf instead:
mapfile -t a < <(find . -type f -iregex .*\.html$)
for html in "${a[#]}"; do printf '>>>%s<<<\n' "$html"; done
Alternative Approach
However, I would rather pipe a loop with a read:
find . -type f -iregex .*\.html$ | while read line
printf '>>>%s<<<\n' "$line"
In this example, the read command reads a line from the standard input and stores the value into line variable.
Although I like the mapfile feature, I find the code with the pipe more clear.
Try adding the bash shebang and set IFS to handle spaces in folders and filenames:
IFS=$(echo -en "\n\b")
htmlDocs=($(find . -print | grep -i '.*[.]html'))
for html in "${htmlDocs[#]}"
echo "${html} \> ${P_MD}"
pandoc --ignore-args -r html -w markdown < "${html}" | awk 'NR > 130' | sed '/<div class="site-info">/,$d' > "${P_MD}"

How to make this script grep only the 1st line

for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' \
| xargs grep -A1 -l 'GLOBALS\|preg_replace\|array_diff_ukey\|gzuncompress\|gzinflate\|post_var\|sF=\|qV=\|_REQUEST'
Its ignoring the -A1. The end result is I just want it to show me files that contain any of matching words but only on the first line of the script. If there is a better more efficient less resource intensive way that would be great as well as this will be ran on very large shared servers.
Use awk instead:
for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' -exec \
awk 'FNR == 1 && /GLOBALS|preg_replace|array_diff_ukey|gzuncompress|gzinflate|post_var|sF=|qV=|_REQUEST/
{ print FILENAME }' {} +
This will print the current input file if the first line matches. It's not ideal, since it will read all of each file. If your version of awk supports it, you can use
awk '/GLOBALS|.../ { print FILENAME } {nextfile}'
The nextfile command will execute for the first line, effectively skipping the rest of the file after awk tests if it matches the regular expression.
The following code is untested:
for i in USER; do
find /home/$i/public_html/ -type f -iname '*.php' | while read -r; do
head -n1 "$REPLY" | grep -q 'GLOBALS\|preg_replace\|array_diff_ukey\|gzuncompress\|gzinflate\|post_var\|sF=\|qV=\|_REQUEST' \
&& echo "$REPLY"
The idea is to loop over each find result, explicitly test the first line, and print the filename if a match was found. I don't like it though because it feels so clunky.
for j in (find /home/$i/public_html/ -type f -iname '*.php');
do result=$(head -1l $j| grep $stuff );
[[ x$result |= x ]] && echo "$j: $result";
You'll need a little more effort to skip leasing blank lines. Fgrep will save resources.
A little perl would bring great improvement, but it's hard to type it on a phone.
On a less cramped keyboard, inserted less brief solution.

Bash script to list files not found

I have been looking for a way to list file that do not exist from a list of files that are required to exist. The files can exist in more than one location. What I have now:
while read fn
if [ ! -f `find . -type f -name $fn ` ];
echo $fn
done < $fileslist
If a file does not exist the find command will not print anything and the test does not work. Removing the not and creating an if then else condition does not resolve the problem.
How can i print the filenames that are not found from a list of file names?
New script:
foundfiles="~/tmp/tmp`date +%Y%m%d%H%M%S`.txt"
touch $foundfiles
while read fn
`find . -type f -name $fn | sed 's:./.*/::' >> $foundfiles`
done < $fileslist
cat $fileslist $foundfiles | sort | uniq -u
rm $foundfiles
while read fn
FPATH=`find . -type f -name $fn`
if [ "$FPATH." = "." ]
echo $fn
done < $fileslist
You were close!
Here is test.bash:
exists=`find . -type f -name $fn`
if [ -n "$exists" ]
echo Found it
It sets $exists = to the result of the find. the if -n checks if the result is not null.
Try replacing body with [[ -z "$(find . -type f -name $fn)" ]] && echo $fn. (note that this code is bound to have problems with filenames containing spaces).
More efficient bashism:
diff <(sort $fileslist|uniq) <(find . -type f -printf %f\\n|sort|uniq)
I think you can handle diff output.
Give this a try:
find -type f -print0 | grep -Fzxvf - requiredfiles.txt
The -print0 and -z protect against filenames which contain newlines. If your utilities don't have these options and your filenames don't contain newlines, you should be OK.
The repeated find to filter one file at a time is very expensive. If your file list is directly compatible with the output from find, run a single find and remove any matches from your list:
find . -type f |
fgrep -vxf - "$1"
If not, maybe you can massage the output from find in the pipeline before the fgrep so that it matches the format in your file; or, conversely, massage the data in your file into find-compatible.
I use this script and it works for me
notfound="Not found:"
len=`cat $1 | wc -l`
while read fn
# don't worry about this, i use it to display the file list progress
n=$((n + 1))
echo -en "\rLooking $(echo "scale=0; $n * 100 / $len" | bc)% "
if [ $(find / -name $fn | wc -l) -gt 0 ]
found=$(printf "$found\n\t$fn")
notfound=$(printf "$notfound\n\t$fn")
done < $fileslist
printf "\n$found\n$notfound\n"
The line counts the number of lines and if its greater than 0 the find was a success. This searches everything on the hdd. You could replace / with . for just the current directory.
$(find / -name $fn | wc -l) -gt 0
Then i simply run it with the files in the files list being separated by newline
./ files.list
