Writing shell script to scan a list of folders - shell

I have a file folders.txt
one
two
three
four
...
that has a list of folder names. [one, two, three and four are names of folders].
Each of these folders has a number of files of different types (different extensions). I want a list of all the files in all the folders of one particular extension, say .txt.
How should my shell script look like?

one way
while read -r folders
do
# add -maxdepth 1 if recursive traversal is not required
find "$folders" -type f -iname "*.txt" | while read -r FILE
do
echo "do something with $FILE"
done
done <"file"
or
folders=$(<file)
find $folders -type f -iname "*.txt" | while read -r FILE
do
echo "do something with $FILE"
done
Bash 4.0 (if recursive find is required)
shopt -s globstar
folders=$(<file)
for d in $folders
do
for file in $d/**/*.txt
do
echo "do something with $file"
done
done

Simply do it on command line:
xargs ls -l < folders.txt | grep '.txt$'

Given the post is simply asking for a list of files, it's quite simple:
tmp=$IFS
IFS=$(echo -en "\n\b")
for i in `cat folders.txt` ; do
ls -l "$i/*.txt"
done
IFS=$tmp

Related

Iterate over files in a subfolder

new here, learning bash for first time.
I'm trying to iterate over files named "list.txt" placed in subfolders, manipulate and create a new files, under the same subfolder. The nest could be like this:
inventory/product_names1/list.txt
inventory/product_names2/list.txt
As product_names is completly random, I would like to iterate over all list.txt files with unix cms like sed/grep/cut and create a new file, under the same random product_names folders.
for f in $( find . -name 'list.txt'); do for list in $f; do cat $f | cut -d']' -f2- > "$f/new_file.txt" ; done ; done
I can access files into the nest using find command. How can I redirect output in the right subfolder if the product_names is random?
inventory/product_names1/list.txt
inventory/product_names1/new_file.txt
inventory/product_names2/list.txt
inventory/product_names2/new_file.txt
This script is intended to work in the root folder, pointing and working with entime path "inventory". $f access to inventory/product_names1/list.txt but I need the output in inventory/product_names1. How can I redirect correctly if I don't have the right value/variable?
You can either use parameter expansion to remove the file name from the path, or you can iterate over all the directories and only work on them if they contain the list.txt file.
#!/bin/bash
for list in inventory/*/list.txt ; do
new=${list%/*}/new_list.txt
echo "$list" "$new"
done
# OR
for dir in inventory/* ; do
if [[ -f $dir/list.txt ]] ; then
echo "$dir"/list.txt "$dir"/new_list.txt
fi
done
find can not only find files but also execute commands when a file is found:
find . -type f -name 'list.txt' -execdir sh -c 'cut -d"]" -f2 list.txt > new_file.txt' \;
Explanations:
-type f condition added to skip directories named list.txt. If some of your list.txt files can be symbolic links and you want to consider them too, use -type f,l with GNU find. With other find you may need to use \(-type f -o -type l\).
-execdir runs the command in the directory where the file was found.
By default find does not print when -execdir is used. If you need it add the -print command:
find . -type f -name 'list.txt' -execdir sh -c 'cut -d"]" -f2 list.txt > new_file.txt' \; -print

How to use bash string formatting to reverse date format?

I have a lot of files that are named as: MM-DD-YYYY.pdf. I want to rename them as YYYY-MM-DD.pdf I’m sure there is some bash magic to do this. What is it?
For files in the current directory:
for name in ./??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
Recursively, in or under the current directory:
find . -type f -name '??-??-????.pdf' -exec bash -c '
for name do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done' bash {} +
Enabling the globstar shell option in bash lets us do the following (will also, like the above solution, handle all files in or below the current directory):
shopt -s globstar
for name in **/??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
All three of these solutions uses a regular expression to pick out the relevant parts of the filenames, and then rearranges these parts into the new name. The only difference between them is how the list of pathnames is generated.
The code prefixes mv with echo for safety. To actually rename files, remove the echo (but run at least once with echo to see that it does what you want).
A direct approach example from the command line:
$ ls
10-01-2018.pdf 11-01-2018.pdf 12-01-2018.pdf
$ ls [0-9]*-[0-9]*-[0-9]*.pdf|sed -r 'p;s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\1-\2/'|xargs -n2 mv
$ ls
2018-10-01.pdf 2018-11-01.pdf 2018-12-01.pdf
The ls output is piped to sed , then we use the p flag to print the argument without modifications, in other words, the original name of the file, and s to perform and output the conversion.
The ls + sed result is a combined output that consist of a sequence of old_file_name and new_file_name.
Finally we pipe the resulting feed through xargs to get the effective rename of the files.
From xargs man:
-n number Execute command using as many standard input arguments as possible, up to number arguments maximum.
You can use the following command very close to the one of klashxx:
for f in *.pdf; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
before:
ls *.pdf
12-01-1998.pdf 12-03-2018.pdf
after:
ls *.pdf
1998-01-12.pdf 2018-03-12.pdf
Also if you have other pdf files that does not respect this format in your folder, what you can do is to select only the files that respect the format: MM-DD-YYYY.pdf to do so use the following command:
for f in `find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf' | xargs -n1 basename`; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
Explanations:
find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf this find command will look only for files in the current working directory that respect your syntax and extract their basename (remove the ./ at the beginning, folders and other type of files that would have the same name are not taken into account, other *.pdf files are also ignored.
for each file you do a move and the resulting file name is computed using sed and back reference to the 3 groups for MM,DD and YYYY
For these simple filenames, using a more verbose pattern, you can simplify the body of the loop a bit:
twodigit=[[:digit:]][[:digit:]]
fourdigit="$twodigit$twodigit"
for f in $twodigit-$twodigit-$fourdigit.pdf; do
IFS=- read month day year <<< "${f%.pdf}"
mv "$f" "$year-$month-$day.pdf"
done
This is basically #Kusalananda's answer, but without the verbosity of regular-expression matching.

Bash script to concatenate text files with specific substrings in filenames

Within a certain directory I have many directories containing a bunch of text files. I’m trying to write a script that concatenates only those files in each directory that have the string ‘R1’ in their filename into one file within that specific directory, and those that have ‘R2’ in another . This is what I wrote but it’s not working.
#!/bin/bash
for f in */*.fastq; do
if grep 'R1' $f ; then
cat "$f" >> R1.fastq
fi
if grep 'R2' $f ; then
cat "$f" >> R2.fastq
fi
done
I get no errors and the files are created as intended but they are empty files. Can anyone tell me what I’m doing wrong?
Thank you all for the fast and detailed responses! I think I wasn't very clear in my question, but I need the script to only concatenate the files within each specific directory so that each directory has a new file ( R1 and R2). I tried doing
cat /*R1*.fastq >*/R1.fastq
but it gave me an ambiguous redirect error. I also tried Charles Duffy's for loop but looping through the directories and doing a nested loop to run though each file within a directory like so
for f in */; do
for d in "$f"/*.fastq;do
case "$d" in
*R1*) cat "$d" >&3
*R2*) cat "$d" >&4
esac
done 3>R1.fastq 4>R2.fastq
done
but it was giving an unexpected token error regarding ')'.
Sorry in advance if I'm missing something elementary, I'm still very new to bash.
A Note To The Reader
Please review edit history on the question in considering this answer; several parts have been made less relevant by question edits.
One cat Per Output File
For the purpose at hand, you can probably just let shell globbing do all the work (if R1 or R2 will be in the filenames, as opposed to the directory names):
set -x # log what's happening!
cat */*R1*.fastq >R1.fastq
cat */*R2*.fastq >R2.fastq
One find Per Output File
If it's a really large number of files, by contrast, you might need find:
find . -mindepth 2 -maxdepth 2 -type f -name '*R1*.fastq' -exec cat '{}' + >R1.fastq
find . -mindepth 2 -maxdepth 2 -type f -name '*R2*.fastq' -exec cat '{}' + >R2.fastq
...this is because of the OS-dependent limit on command-line length; the find command given above will put as many arguments onto each cat command as possible for efficiency, but will still split them up into multiple invocations where otherwise the limit would be exceeded.
Iterate-And-Test
If you really do want to iterate over everything, and then test the names, consider a case statement for the job, which is much more efficient than using grep to check just one line:
for f in */*.fastq; do
case $f in
*R1*) cat "$f" >&3
*R2*) cat "$f" >&4
esac
done 3>R1.fastq 4>R2.fastq
Note the use of file descriptors 3 and 4 to write to R1.fastq and R2.fastq respectively -- that way we're only opening the output files once (and thus truncating them exactly once) when the for loop starts, and reusing those file descriptors rather than re-opening the output files at the beginning of each cat. (That said, running cat once per file -- which find -exec {} + avoids -- is probably more overhead on balance).
Operating Per-Directory
All of the above can be updated to work on a per-directory basis quite trivially. For example:
for d in */; do
find "$d" -name R1.fastq -prune -o -name '*R1*.fastq' -exec cat '{}' + >"$d/R1.fastq"
find "$d" -name R2.fastq -prune -o -name '*R2*.fastq' -exec cat '{}' + >"$d/R2.fastq"
done
There are only two significant changes:
We're no longer specifying -mindepth, to ensure that our input files only come from subdirectories.
We're excluding R1.fastq and R2.fastq from our input files, so we never try to use the same file as both input and output. This is a consequence of the prior change: Previously, our output files couldn't be considered as input because they didn't meet the minimum depth.
Your grep is searching the file contents instead of file name. You could rewrite it this way:
for f in */*.fastq; do
[[ -f $f ]] || continue
if [[ $f = *R1* ]]; then
cat "$f" >> R1.fastq
elif [[ $f = *R2* ]]; then
cat "$f" >> R2.fastq
fi
done
Find in a forloop might suit this:
for i in R1 R2
do
find . -type f -name "*${i}*" -exec cat '{}' + >"$i.txt"
done

Bash looping through files in Directory

I have a bash script, created by someone else, that I need to modify a little.
Since I'm new to Bash, I may need a little help with some common commands.
The script simply loops through a directory (recursively) for a specific file extension.
Here's the current script: (runme.sh)
#! /bin/bash
SRC=/docs/companies/
function report()
{
echo "-----------------------"
find $SRC -iname "*.aws" -type f -print
echo -e "\033[1mSOURCE FILES=\033[0m" `find $SRC -iname "*.aws" -type f -print |wc -l`
echo "-----------------------"
exit 0
}
report
I simply type #./runme.sh and I can see a list of all files with the extension of .aws
My primary goal is to limit the search. (some directories have way too many files)
I would like to run the script, limiting it to just 20 files.
Do I need to place the entire script into a loop method?
That's easy -- as long as you want the first 20 files, just pipe the first find command through head -n 20. But I can't resist a little cleanup while I'm at it: as written, it runs find twice, once to print the filenames and once to count them; if there are a lot of files to search, this is a waste of time. Second, wrapping the actual content of the script in a function (report) doesn't make much sense, and having the function exit (rather than returning) makes even less. Finally, I like to protect filenames with double-quotes and hate backquotes (use $() instead). So I took the liberty of a bit of cleanup:
#! /bin/bash
SRC=/docs/companies/
files="$(find "$SRC" -iname "*.aws" -type f -print)"
if [ -n "$files" ]; then
count="$(echo "$files" | wc -l)"
else # echo would print one line even if there are no files, so special-case the empty list
count=0
fi
echo "-----------------------"
echo "$files" | head -n 20
echo -e "\033[1mSOURCE FILES=\033[0m $count"
echo "-----------------------"
Use head -n 20 (as proposed by Peter). Additional remark: the script is very inefficient, as it runs find twice. You should consider using tee to gennerate a temporary file when the command runs for the first time, count the lines of this file afterwards and delete the file.
I would personnaly prefer to do it like this:
files=0
while read file ; do
files=$(($files + 1))
echo $file
done < <(find "$SRC" -iname "*.aws" -type f -print0 | head -20)
echo "-----------------------"
find $SRC -iname "*.aws" -type f -print
echo -e "\033[1mSOURCE FILES=\033[0m" $files
echo "-----------------------"
If you just want there count, you could only use find "$SRC" -iname "*.aws" -type f -print0 | head -20

bash: how to change the basename only of a list of files [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
makefile: how to add a prefix to the basename?
I have a lit of files (which I get from find bla -name "*.so") such as:
/bla/a1.so
/bla/a2.so
/bla/blo/a3.so
/bla/blo/a4.so
/bla/blo/bli/a5.so
and I want to rename them such as it becomes:
/bla/liba1.so
/bla/liba2.so
/bla/blo/liba3.so
/bla/blo/liba4.so
/bla/blo/bli/liba5.so
... i.e. add the prefix 'lib' to the basename
any idea on how to do that in bash ?
Something along the lines of:
for a in /bla/a1.so /bla/a2.so /bla/blo/a4.so
do
dn=$(dirname $a)
fn=$(basename $a)
mv "$a" "${dn}/lib${fn}"
done
should do it. You might want to add code to read the list of filenames from a file, rather than listing them verbatim in the script, of course.
find . -name "*.so" -printf "mv '%h/%f' '%h/lib%f'\n" | bash
The code will rename files in current directory and subdirectories to append "lib" in front of .so filenames.
No looping needed, as find already does its recursive work to list the files. The code builds the "mv" commands one by one and executes them. To see the "mv" commands without executing them, simply remove the piping to shell part "| bash".
find's printf command understands many variables which makes it pretty scalable. I only needed to use two here:
%h: directory
%f: filename
How to test it:
Run this first (will perform nothing yet, only print lines on the screen):
find . -name "*.so" -printf "mv '%h/%f' '%h/lib%f'\n" | less -S
This will show you all the commands that your script will execute. If you're satisfied with the result, simply execute it afterwards by piping it into bash instead of less.
find . -name "*.so" -printf "mv '%h/%f' '%h/lib%f'\n" | bash
while multiliner
A slightly more robust and generalized solution based on $nfm (maybe more than you really need) would be
while IFS= read -r -u3 -d $'\0' FILE; do
DIR=`dirname $FILE`;
FILENAME=`basename $FILE`;
mv $FILE ${DIR}/lib${FILENAME};
done 3< <(find bla -name *.so -print0 | sort -rz)
This is quite robust:
read -u3 and 3< does not interfere with stdin
-print0 + IFS= + -d $'/0' allows for newlines in filenames
sort -rz renames deeper paths first, so that you can even rename directories and the files inside them at once
find -execdir + rename
This would be perfect if it weren't for the PATH annoyances, see: Find multiple files and rename them in Linux
Try mmv:
cd /bla/
mmv "*.so" "lib#1.so"
(mmv "*" "lib#1" would also work but it's less safe).
If you don't have mmv installed, get it.
basename and dirname are your friends :)
You want something like this (excuse my bash syntax - it's a little rusty):
for FILE in `find bla -name *.so` do
DIR=`dirname $FILE`;
FILENAME=`basename $FILE`;
mv $FILE ${DIR}/lib${FILENAME};
done
Beaten to the punch!
Note I've commented out the mv command to prevent any accidental mayhem
for f in *
do
dir=`dirname "$f"`
fname=`basename "$f"`
new="$dir/lib$fname"
echo "new name is $new"
# only uncomment this if you know what you are doing
# mv "$f" "$new"
done

Resources