How do I traverse through every file in a folder? - bash

I have a folder called exam. This folder has 3 folders called math, physics and english. All of these folders have some sub folders and files in them. I want to traverse through every folder and print the path of every folder and file on another file called files. I've done this:
#!/bin/bash
LOC="/home/school/exam/*"
{
for f in $LOC
do
echo $f
done
} > "files"
The exit I get is:
/home/school/exam/math
/home/school/exam/physics
/home/school/exam/english
I can't figure out how to make the code visit and do the same thing to the sub folders of exam. Any suggestions?
PS I'm just a beginner in shell scripting.

find /home/school/exam -print > files

With the globstar option bash will recurse all filenames in subdirectories when using two adjacent stars
use :
shopt -s globstar
for i in /home/school/exam/**
The reference here is man bash:
globstar
If set, the pattern ** used in a pathname expansion context
will match all files and zero or more directories and
subdirectories. If the pattern is followed by a /, only
directories and subdirectories match.
and info bash:
* Matches any string, including the null string. When the
globstar shell option is enabled, and * is used in a
pathname expansion context, two adjacent *s used as a
single pattern will match all files and zero or more
directories and subdirectories. If followed by a /, two
adjacent *s will match only directories and subdirecto‐
ries.

you can use find command, it could get all files, then you can do something on them, using exec or xargs for example.

You can also use the tree command which is included in most *nix distributions. (Though Ubuntu is a notable exception - but can be installed via apt-get)
LOC="/home/school/exam/"
tree -if $LOC > files

How about this to list all your files recursively.
for i in *; do ls -l $i ; done

Related

how list just one file from a (bash) shell directory listing

A bit lowly a query but here goes:
bash shell script. POSIX, Mint 21
I just want one/any (mp3) file from a directory. As a sample.
In normal execution, a full run, the code would be such
for f in *.mp3 do
#statements
done
This works fine but if I wanted to sample just one file of such an array/glob (?) without looping, how might I do that? I don't care which file, just that it is an mp3 from the directory I am working in.
Should I just start this for-loop and then exit(break) after one statement, or is there a neater way more tailored-for-the-job way?
for f in *.mp3 do
#statement
break
done
Ta (can not believe how dopey I feel asking this one, my forehead will hurt when I see the answers )
Since you are using Linux (Mint) you've got GNU find so one way to get one .mp3 file from the current directory is:
mp3file=$(find . -maxdepth 1 -mindepth 1 -name '*.mp3' -printf '%f' -quit)
-maxdepth 1 -mindepth 1 causes the search to be restricted to one level under the current directory.
-printf '%f' prints just the filename (e.g. foo.mp3). The -print option would print the path to the filename (e.g. ./foo.mp3). That may not matter to you.
-quit causes find to exit as soon as one match is found and printed.
Another option is to use the Bash : (colon) command and $_ (dollar underscore) special variable:
: *.mp3
mp3file=$_
: *.mp3 runs the : command with the list of .mp3 files in the current directory as arguments. The : command ignores its arguments and does nothing.
mp3file=$_ sets the value of the mp3file variable to the last argument supplied to the previous command (:).
The second option should not be used if the number of .mp3 files is large (hundreds or more) because it will find all of the files and sort them by name internally.
In both cases $mp3file should be checked to ensure that it really exists (e.g. [[ -e $mp3file ]]) before using it for anything else, in case there are no .mp3 files in the directory.
I would do it like this in POSIX shell:
mp3file=
for f in *.mp3; do
if [ -f "$f" ]; then
mp3file=$f
break
fi
done
# At this point, the variable mp3file contains a filename which
# represents a regular file (or a symbolic link) with the .mp3
# extension, or empty string if there is no such a file.
The fact that you use
for f in *.mp3 do
suggests to me, that the MP3s are named without to much strange characters in the filename.
In that case, if you really don't care which MP3, you could:
f=$(ls *.mp3|head)
statement
Or, if you want a different one every time:
f=$(ls *.mp3|sort -R | tail -1)
Note: if your filenames get more complicated (including spaces or other special characters), this will not work anymore.
Assuming you don't have spaces in your filenames, (and I don't understand why the collective taboo is against using ls in scripts at all, rather than not having spaces in filenames, personally) then:-
ls *.mp3 | tr ' ' '\n' | sed -n '1p'

bash script to check for folders containing two specific files recursively and print their path

I want to check recursively for two specific files say "hem" and "haw" and print the folders containing both the files.
find <top_folder> -name hem -o name haw -print
or
cd <top_folder>
ls **/hem **/haw
Try this Shellcheck-clean code:
shopt -s globstar
for hempath in ./**/hem ; do
dir=${hempath%/*}
[[ -e $dir/haw ]] && printf '%s\n' "$dir"
done
See glob - Greg's Wiki for information about globstar and the ** pattern.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${hempath%/*}.
The code uses ./**/hem instead of **/hem to ensure that all the matched paths start with ./ so it works even if the files are in the current directory (.).
See the accepted, and excellent, answer to Why is printf better than echo? to understand why printf is used instead of echo to print the directory path.
Note that support for globstar was introduced in Bash 4.0, and it was dangerous in versions prior to 4.3 because it used to follow symlinks, possibly leading to infinite recursion (and failure) or unwanted duplicates. See When does globstar descend into symlinked directories?.

Recursively search a directory for each file in the directory on IBMi IFS

I'm trying to write two (edit: shell) scripts and am having some difficulty. I'll explain the purpose and then provide the script and current output.
1: get a list of every file name in a directory recursively. Then search the contents of all files in that directory for each file name. Should return the path, filename, and line number of each occurrence of the particular file name.
2: get a list of every file name in a directory recursively. Then search the contents of all files in the directory for each file name. Should return the path and filename of each file which is NOT found in any of the files in the directories.
I ultimately want to use script 2 to find and delete (actually move them to another directory for archiving) unused files in a website. Then I would want to use script 1 to see each occurrence and filter through any duplicate filenames.
I know I can make script 2 move each file as it is running rather than as a second step, but I want to confirm the script functions correctly before I do any of that. I would modify it after I confirm it is functioning correctly.
I'm currently testing this on an IMBi system in strqsh.
My test folder structure is:
scriptTest
---subDir1
------file4.txt
------file5.txt
------file6.txt
---subDir2
------file1.txt
------file7.txt
------file8.txt
------file9.txt
---file1.txt
---file2.txt
---file3.txt
I have text in some of those files which contains existing file names.
This is my current script 1:
#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d -exec basename {} \;`
for i in $files
do
grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;
done
Right now it functions correctly with exception to providing the path to the file which had a match. Doesn't grep return the file path by default?
I'm a little further away with script 2:
#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d`
for i in $files
do
#split $i on '/' and store into an array
IFS='/' read -a array <<< "$i"
#get last element of the array
echo "${array[-1]}"
#perform a grep similar to script 2 and store it into a variable
filename="grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;"
#Check if the variable has anything in it
if [ $filename = "" ]
#if not then output $i for the full path of the current needle.
then echo $i;
fi
done
I don't know how to split the string $i into an array. I keep getting an error on line 6
001-0059 Syntax error on line 6: token redirection not expected.
I'm planning on trying this on an actual linux distro to see if I get different results.
I appreciate any insight in advanced.
Introduction
This isn't really a full solution, as I'm not 100% sure I understand what you're trying to do. However, the following contain pieces of a solution that you may be able to stitch together to do what you want.
Create Test Harness
cd /tmp
mkdir -p scriptTest/subDir{1,2}
mkdir -p scriptTest/subDir1/file{4,5,6}.txt
mkdir -p scriptTest/subDir2/file{1,8,8}.txt
touch scriptTest/file{1,2,3}.txt
Finding and Deleting Duplicates
In the most general sense, you could use find's -exec flag or a Bash loop to run grep or other comparison on your files. However, if all you're trying to do is remove duplicates, then you might simply be better of using the fdupes or duff utilities to identify (and optionally remove) files with duplicate contents.
For example, given that all the .txt files in the test corpus are zero-length duplicates, consider the following duff and fdupes examples
duff
Duff has more options, but won't delete files for you directly. You'll likely need to use a command like duff -e0 * | xargs -0 rm to delete duplicates. To find duplicates using the default comparisons:
$ duff -r scriptTest/
8 files in cluster 1 (0 bytes, digest da39a3ee5e6b4b0d3255bfef95601890afd80709)
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
fdupes
This utility offers the ability to delete duplicates directly in various ways. One such way is to invoke fdupes . --delete --noprompt once you're confident that you're ready to proceed. However, to find the list of duplicates:
$ fdupes -R scriptTest/
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
Get a List of All Files, Including Non-Duplicates
$ find scriptTest -name \*.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
You could then act on each file with the find's -exec {} + feature, or simply use a grep that supports the --recursive --files-with-matches flags to find files with matching content.
Passing Find Results to a Bash Loop as an Array
Alternatively, if you know for sure that you won't have spaces in the file names, you can also use a Bash array to store the files into a variable you can iterate over in a Bash for-loop. For example:
files=$(find scriptTest -name \*.txt)
for file in "${files[#]}"; do
: # do something with each "$file"
done
Looping like this is often slower, but may provide you with the additional flexibility you need if you're doing something complicated. YMMV.

Do not start loop if there is no files in directory?

All,
I am running BASH in Solaris 10
I have the following shell script that loops in a directory depending on the presence of CSV files.
The problem is with this piece of code is that it still does one loop even if there is no CSV files in that directory and then calls SQL loader.
SQLLoader then produces a log file because there is no file to process and this is beginning to mess up my directory filling it with log files.
for file in *.csv ;
do
echo "SQLLoader is reading : " $file
sqlldr <User>/<Password>#<DBURL>:<PORT>/<SID> control=sqlloader.ctl log=$inbox/$file.log data=$inbox/$file
done
How do I stop it going into a loop if there is no CSV files in that directory of $inbox
Say:
shopt -s nullglob
before your for loop.
This is not the default, and saying for file in *.csv when you don't have any matching files expands it to *.csv.
Quoting from the documentation:
nullglob
If set, Bash allows filename patterns which match no files to expand to a null
string, rather than themselves.
Use find to search files
for file in `find -name "*.csv"` ;
First off, using nullglob is the correct answer if it is available. However, a POSIX-compliant option is available.
The pattern will be treated as literal text if there are no matches. You can catch this with a small hack:
for file in *.csv; do
[ -f "$file" ] || break
...
done
When there are no matches, file will be set to the literal string *.csv, which is not the name of a file, so -f "$file" will fail. Otherwise, file will be set in turn to the name of each file matching the pattern, and -f "$file" will succeed every time. Note this will work even if there is an file named *.csv. The drawback is that you have to make a redundant test for each existing file.

Bash: Check all files in a location against another for existence

I'm after a little help with some Bash scripting (on OSX). I want to create a script that takes two parameters - source folder and target folder - and checks all files in the source hierarchy to see whether or not they exist in the target hierarchy. i.e. Given a data DVD check whether the files contained on it are already on the internal drive.
What I've come up with so far is
#!/bin/bash
if [ $# -ne 2 ]
then
echo "Usage is command sourcedir targetdir"
exit 0
fi
source="$1"
target="$2"
for f in "$( find $source -type f -name '*' -print )"
do
I'm now not sure how it's best to obtain the filename without its path and then see if it exists. I am really a beginner at scripting.
Edit: The answers given so far are all very efficient in terms of compact code. However I need to be able to look for files found within the total source hierarchy anywhere within the target hierarchy. If found I would like to compare checksums and last modified dates etc and comment or, if not found, I would like to note this. The purpose is to check whether files on external media have been uploaded to a file server.
This should give you some ideas:
#!/bin/bash
DIR1="tmpa"
DIR2="tmpb"
function sorted_contents
{
cd "$1"
find . -type f | sort
}
DIR1_CONTENTS=$(sorted_contents "$DIR1")
DIR2_CONTENTS=$(sorted_contents "$DIR2")
diff -y <(echo "$DIR1_CONTENTS") <(echo "$DIR2_CONTENTS")
In my test directories, the output was:
[user#host so]$ ./dirdiff.sh
./address-book.dat ./address-book.dat
./passwords.txt ./passwords.txt
./some-song.mp3 <
./the-holy-grail.info ./the-holy-grail.info
> ./victory.wav
./zzz.wad ./zzz.wad
If its not clear, "some-song.mp3" was only in the first directory while "victory.wav" was only in the second. The rest of the files were common.
Note that this only compares the file names, not the contents. If you like where this is headed, you could play with the diff options (maybe --suppress-common-lines if you want cleaner output).
But this is probably how I'd approach it -- offload a lot of the work onto diff.
EDIT: I should also point out that something as simple as:
[user#host so]$ diff tmpa tmpb
would also work:
Only in tmpa: some-song.mp3
Only in tmpb: victory.wav
... but not feel as satisfying as writing a script yourself. :-)
To list only files in $source_dir that do not exist in $target_dir:
comm -23 <(cd "$source_dir" && find .|sort) <(cd "$target_dir" && find .|sort)
You can limit it to just regular files with -f on the find commands, etc.
The comm command (short for "common") finds lines in common between two text files and outputs three columns: lines only in the first file, lines only in the second file, and lines common to both. The numbers suppress the corresponding column, so the output of comm -23 is only the lines from the first file that don't appear in the second.
The process substitution syntax <(command) is replaced by the pathname to a named pipe connected to the output of the given command, which lets you use a "pipe" anywhere you could put a filename, instead of only stdin and stdout.
The commands in this case generate lists of files under the two directories - the cd makes the output relative to the directories being compared, so that corresponding files come out as identical strings, and the sort ensures that comm won't be confused by the same files listed in different order in the two folders.
A few remarks about the line for f in "$( find $source -type f -name '*' -print )":
Make that "$source". Always use double quotes around variable substitutions. Otherwise the result is split into words that are treated as wildcard patterns (a historical oddity in the shell parsing rules); in particular, this would fail if the value of the variable contain spaces.
You can't iterate over the output of find that way. Because of the double quotes, there would be a single iteration through the loop, with $f containing the complete output from find. Without double quotes, file names containing spaces and other special characters would trip the script.
-name '*' is a no-op, it matches everything.
As far as I understand, you want to look for files by name independently of their location, i.e. you consider /dvd/path/to/somefile to be a match to /internal-drive/different/path-to/somefile. So make an list of files on each side indexed by name. You can do this by massaging the output of find a little. The code below can cope with any character in file names except newlines.
list_files () {
find . -type f -print |
sed 's:^\(.*\)/\(.*\)$:\2/\1/\2:' |
sort
}
source_files="$(cd "$1" && list_files)"
dest_files="$(cd "$2" && list_files)"
join -t / -v 1 <(echo "$source_files") <(echo "$dest_files") |
sed 's:^[^/]*/::'
The list_files function generates a list of file names with paths, and prepends the file name in front of the files, so e.g. /mnt/dvd/some/dir/filename.txt will appear as filename.txt/./some/dir/filename.txt. It then sorts the files.
The join command prints out lines like filename.txt/./some/dir/filename.txt when there is a file called filename.txt in the source hierarchy but not in the destination hierarchy. We finally massage its output a little since we no longer need the filename at the beginning of the line.

Resources