how to loop over folders/directories using bash script? - bash

i'm trying to count all the .txt files in the folders, the problem is that the main folder has more than one folder and inside everyone of them there are txt files , so in total i want to count the number of txt files . till now i've tried to build such a solution,but of course it's wrong:
#!/bin/bash
counter=0
for i in $(ls /Da) ; do
for j in $(ls i) ; do
$counter=$counter+1
done
done
echo $counter
the error i'm getting is :ls cannot access i ...
the problem is that i don't know how i'm supposed to build the inner for loop as it depends on the external for loop(schema) ?

This can work for you
find . -name "*.txt" | wc -l
In the first part find looks for the *.txt from this folder (.) and its subfolders. In the second part wc counts the returnes lines (-l) of find.

You want to avoid parsing ls and you want to quote your variables.
There is no need for repeated loops, either.
printf 'x\n' /Da/* /Da/*/* | wc -l
depending also on whether you expect the entries in /Da to be all files (in which case /Da/* will suffice), all directories (in which case /Da/*/* alone is enough), or both. Additionally, if you don't want to count directories at all, maybe switch to find /Da -type f -printf 'x\n' or similar.
There is no need to print the file names at all; this avoids getting the wrong result if a file name should ever contain a line feed (touch $'/Da/ick\npoo' to see this in action.)
More generally, a correct nested loop looks like
for i in list of things; do
for j in different items, perhaps involving "$i"; do
things with "$j" and perhaps also "$i"
done
done

i is a variable, so you need to reference it via $, i.e. the second loop should be
for j in $(ls "$i") ; do

Related

Renaming Subdirectories To Different Date Format

In Ubuntu Linux:
I have a directory with many sub-directories. Each of those sub-directories has sub-directories whose names are in the date format:
M_D_YYYY
The problem is that when sorted alphabetically, they don't list in chronological order.
For example, 7_25_2019 lists before 7_3_2019 and even before 7_3_2018.
I want to rename the sub-directories into YYYY_MM_DD format, so that alphabetical order matches chronological order. So, in previous example, it will rename to:
2018_07_03
2019_07_03
2019_07_25
How is this best accomplished within a shell command or script?
NOTES:
(1) I want to rename the sub-directories only, not the files in them.
(2) I don't have control on the file structure at the time of the posting of the files via FTP. But, there is a break overnight. The ideal would be that after the first run "to fix the existing", that I could run a command/script (either the same or different) nightly (after midnight) that would leave the sub-directories named in the correct format and only modify the sub-directories (from the previous day) that are in the incorrect format.
(3) It is important that 1st level sub-directories that are not in a date format DO NOT get renamed.
You can find all directories that meet your criteria (and that haven't been renamed) with a find invocation like this:
find -E . -type d -regex '.*/[1-9][0-9]?_[1-9][0-9]?_[12][0-9][0-9][0-9]'
Having found the directories that need renaming, the next question is, how to rename them?
This is just tricky enough that it will be useful to encapsulate the functionality in an auxiliary script, which I'm going to call renamedir. And once we have that script, we'll be able to invoke it on each directory using find's -exec operator:
find -E . -type d -regex '.*/[1-9][0-9]?_[1-9][0-9]?_[12][0-9][0-9][0-9]' -exec renamedir {} \;
So what's inside the renamedir script? My first cut at it isn't terribly elegant, but it works. The best way to ensure that the month and day numbers are padded out to two digits is by using printf, I think. But that means we need to break out the individual month, day, and year numbers into separate shell variables. And I don't know of a nice, clean, portable way to do that, so what's here is rather brute force. Anyway, here's my script:
path=`dirname "$1"`
dir=`basename "$1"`
m=`echo "$dir" | sed 's/^\([0-9][0-9]*\)_\([0-9][0-9]*\)_\([0-9][0-9]*\)$/\1/'`
d=`echo "$dir" | sed 's/^\([0-9][0-9]*\)_\([0-9][0-9]*\)_\([0-9][0-9]*\)$/\2/'`
y=`echo "$dir" | sed 's/^\([0-9][0-9]*\)_\([0-9][0-9]*\)_\([0-9][0-9]*\)$/\3/'`
if test $m -lt 1 -o $m -gt 12; then exit; fi
if test $d -lt 1 -o $d -gt 31; then exit; fi
if test $y -lt 1900 -o $y -gt 2100; then exit; fi
newdir=`printf "%d_%02d_%02d" $y $m $d`
mv "$path"/"$dir" "$path"/"$newdir"
(Obviously that's a sh script. Perhaps the better approach would be to use Perl or Python.)
When I run this, I do get error messages like
find: ./tmp/7_25_2019: No such file or directory
These are because we're renaming directories out from under find, after it's found them but before it has a chance to descend down into them. You can ignore those messages, I guess. (If your m_d_yyyy directories ever contin m_d_yyyy subdirectories, this means you'll miss them, but they'll get found on the next run, so maybe that's okay.)

Iterating a group of folders and files while removing certain files that are contained in a list

I have a set of files that I download that contain files that I want to remove. I would like to create a list of some form, the script should support blobbing so I can be pretty aggressive with file removal without getting into the complexities of using regex within the list of files.
I am also stumped in that I put a sleep command within the loop of my script, and that is not getting run after each iteration, but only once at the end of run.
Here is the script
# Get to the place where all the durty work happens
cd /Volumes/Videos
FILES=".DS_Store
*.txt
*.sample
*.sample.*
*.samples"
if [ "$(pwd)" == "/Volumes/Videos" ]; then
echo "You are currently in $(pwd)"
echo "You would not have read the above if this script were operating anywhere else"
# Dekete fikes from list above
for f in "$FILES"
do
echo "Removing $f";
rm -f "$f";
echo "$f has been deleted";
sleep 10;
echo "";
echo "";
done
# See if dir is empty, ask if we want to delete it or keep it
# Iterate evert movie file, see if we want to nuke contents. Maybe use part of last openned to help find those files fast
else
# Not in the correct directory
echo "This script is trying to alter files in a location that it should not be working"
echo "Script is currently trying to work in $(pwd)"
exit 1
fi
The main thing that has be completely stumped is the sleep command. It is run once, not once per file iteration. If I have 100 files to go through I get 10 seconds of sleep, not 100*10.
I will be adding in some other features, like if a file is smaller than x bytes, go ahead and delete it too. These files will have spaces and other odd characters in the filenames, am I creating my variables correctly to make this script handle those scenarios as well as be as POSIX compliant as possible. I will change the shebang to sh over bash and try to add in set -o noun set and set -o err exit though I tend to have a lot of trouble when I do that.
Is there a better form of list I should be using? I am not objectionable to storing the pattern match list in a separate file. I can include it, or read it in with any of a few commands.
These are also nested files, a dir, that contains files, or a dir that contains a dir that contains some files. Something like this:
/Volumes/Videos:
The Great guy in a tree
The Great guy in a tree S01e01
sample.avi
readme.txt
The Great guy in a tree S01e01.mpg
The Great guy in a tree S01e02
The Great guy in a tree S01e02.mpg
The Great guy in a tree S01e03
The Great guy in a tree S01e03.mpg
The Great guy in a tree S01e04
The Great guy in a tree S01e04.mpg
Thank you.
The reason that your script is not working as you expect is because your for loop is written incorrectly. This example shows what is going on:
$ i=0
$ FILES=".DS_Store
*.txt
*.sample
*.sample.*
*.samples"
$ for f in "$FILES"; do echo $((++i)) "$f"; done
1 .DS_Store
*.txt
*.sample
*.sample.*
*.samples
Note that only one number is output, indicating that the loop is only going around once. Also, no pathname expansion has occurred.
In order to make your script work as you expect, you can remove the quotes around "$FILES". This means that each word in your string will be evaluated separately, rather than all at once. It also means that pathname expansion of the wildcards that you are using will occur, so all files ending in .txt will be removed, which I guess is what you meant.
Instead of using a string to store your list of expressions, you might prefer to make use of an array:
FILES=( '.DS_Store' '*.txt' '*.sample' '*.sample.*' '*.samples' )
The quotes around each element prevent expansion (so the array only has 5 elements, not the fully expanded list). You could then change your loop to for f in ${FILES[#]} (again, no double quotes results in each element of the list being expanded).
Although removing the quotes fixes your script, I would agree with #hek2mgl's suggestion of using find. It allows you to find files by name, size, date modified and a lot more in one line. If you want to pause between the deletion of each file, you could use something like this:
find \( -name "*.sample" -o -name "*.txt" \) -delete -exec sleep 10 \;
You can use find:
find -type f -name '.DS_Store' -o -name '*.txt' -o -name '*.sample.*' -o -name '*.samples' -delete

Working with long list in KSH88

I am working with a directory that has names of files that are marked for processing and deletion. What I need to do is get the names of all the files and put them into an array and then go through the array and do the work. Problem is, is that KSH88 only handles arrays up to 1024 in size and that there can be more file names in the directory!
I just need to be able to get the total current file names in the directory as looping through and doing everything else is easy, the current part of the script is:
#This is getting the result set and attempting to get the total file names as initalNumber.
integer initialNumber=${#`find $source -path "$source/*" -prune -type f -name "$regex" | sed 's!.*/!!'`[#]}
This is giving me a "Bad Substitution Error" currently. This is my first time working with KSH88 so I am not sure if using the result set as an array is even possible. Any help would be awesome, thanks.
Can't you simply get the number of files without such a complicated approach? e.g.
integer initialNumber=$(ls -l | grep -v ^d | wc -l)
Are you using the array for other purposes? There are better ways for iterating through a list of files. To iterate through a list of files in the current directory, it seems much easier to do this more directly.
e.g.
cd $dirname;
for filename in ABC*DEF??.gz; do
# some action here...
done

whats the correct way to loop this

I have a script where inotifywait is piped into a while loop that executes the following logic.
cp "$S3"/2/post2.png "$S3";
mv "$S3"/1/post1.png "$S3"/2/post2.png;
cp "$S3"/3/post3.png "$S3";
mv "S3"/post2.png "$S3"/3/post3.png;
so forth and so on..... then at the end of the script...
mv "$dir"/$file "$S3"/1/post1.png
That line represents a fresh post, the above is the rotation of older post.
I can can hand code the iterations all the way down to 100+, but I would like to program more efficiently and save time.
So, what's some correct ways to loop this?
I think a better mechanism would list the directories in "$S3" in reverse numeric order, and arrange to process them like that. It isn't clear if the 100 directories are all present or whether they need to be created. We'll assume that directories 1..100 might exist, and directory N will always and only contain postN.png.
I'm assuming that there are no spaces, newlines or other awkward characters in the file paths; this means that ls can be used without too much risk.
for dirnum in $(cd "$S3"; ls */*.png | sed 's%/.*%%' | sort -nr)
do
next=$(($dirnum + 1))
mv "$S3/$dirnum/post$dirnum.png" "$S3/$next/post$next.png"
done
The cd "$S3" means I don't get a possibly long pathname included in the output; the ls */*.png lists the files that exist; the sed removes the file name and slash, leaving just a list of directory numbers containing files; and the sort puts the directories in reverse numeric order.
The rest is straight-forward, given the assumption that the necessary directories already exist. It would not be hard to add [ -d "$S3/$next" ] || mkdir -p "$S3/$next" before moving the file. Clearly, after the loop you can use your final command:
mv "$dir/$file" "$S3/1/post1.png"
Note that I've enclosed complete names in double quotes; it generally leads to fewer nasty surprises if something acquires spaces unexpectedly.
Try this:
for i in $(ls -r1 "$3"); do
mkdir -p "$3/$((i+1))"
mv "$3/$i/post$i.png" "$3/$((i+1))/post$((i+1)).png"
done
mv "$dir"/$file "$S3"/1/post1.png
The loop will iterate through all directories in reverse order and move the files.

Bash: Check all files in a location against another for existence

I'm after a little help with some Bash scripting (on OSX). I want to create a script that takes two parameters - source folder and target folder - and checks all files in the source hierarchy to see whether or not they exist in the target hierarchy. i.e. Given a data DVD check whether the files contained on it are already on the internal drive.
What I've come up with so far is
#!/bin/bash
if [ $# -ne 2 ]
then
echo "Usage is command sourcedir targetdir"
exit 0
fi
source="$1"
target="$2"
for f in "$( find $source -type f -name '*' -print )"
do
I'm now not sure how it's best to obtain the filename without its path and then see if it exists. I am really a beginner at scripting.
Edit: The answers given so far are all very efficient in terms of compact code. However I need to be able to look for files found within the total source hierarchy anywhere within the target hierarchy. If found I would like to compare checksums and last modified dates etc and comment or, if not found, I would like to note this. The purpose is to check whether files on external media have been uploaded to a file server.
This should give you some ideas:
#!/bin/bash
DIR1="tmpa"
DIR2="tmpb"
function sorted_contents
{
cd "$1"
find . -type f | sort
}
DIR1_CONTENTS=$(sorted_contents "$DIR1")
DIR2_CONTENTS=$(sorted_contents "$DIR2")
diff -y <(echo "$DIR1_CONTENTS") <(echo "$DIR2_CONTENTS")
In my test directories, the output was:
[user#host so]$ ./dirdiff.sh
./address-book.dat ./address-book.dat
./passwords.txt ./passwords.txt
./some-song.mp3 <
./the-holy-grail.info ./the-holy-grail.info
> ./victory.wav
./zzz.wad ./zzz.wad
If its not clear, "some-song.mp3" was only in the first directory while "victory.wav" was only in the second. The rest of the files were common.
Note that this only compares the file names, not the contents. If you like where this is headed, you could play with the diff options (maybe --suppress-common-lines if you want cleaner output).
But this is probably how I'd approach it -- offload a lot of the work onto diff.
EDIT: I should also point out that something as simple as:
[user#host so]$ diff tmpa tmpb
would also work:
Only in tmpa: some-song.mp3
Only in tmpb: victory.wav
... but not feel as satisfying as writing a script yourself. :-)
To list only files in $source_dir that do not exist in $target_dir:
comm -23 <(cd "$source_dir" && find .|sort) <(cd "$target_dir" && find .|sort)
You can limit it to just regular files with -f on the find commands, etc.
The comm command (short for "common") finds lines in common between two text files and outputs three columns: lines only in the first file, lines only in the second file, and lines common to both. The numbers suppress the corresponding column, so the output of comm -23 is only the lines from the first file that don't appear in the second.
The process substitution syntax <(command) is replaced by the pathname to a named pipe connected to the output of the given command, which lets you use a "pipe" anywhere you could put a filename, instead of only stdin and stdout.
The commands in this case generate lists of files under the two directories - the cd makes the output relative to the directories being compared, so that corresponding files come out as identical strings, and the sort ensures that comm won't be confused by the same files listed in different order in the two folders.
A few remarks about the line for f in "$( find $source -type f -name '*' -print )":
Make that "$source". Always use double quotes around variable substitutions. Otherwise the result is split into words that are treated as wildcard patterns (a historical oddity in the shell parsing rules); in particular, this would fail if the value of the variable contain spaces.
You can't iterate over the output of find that way. Because of the double quotes, there would be a single iteration through the loop, with $f containing the complete output from find. Without double quotes, file names containing spaces and other special characters would trip the script.
-name '*' is a no-op, it matches everything.
As far as I understand, you want to look for files by name independently of their location, i.e. you consider /dvd/path/to/somefile to be a match to /internal-drive/different/path-to/somefile. So make an list of files on each side indexed by name. You can do this by massaging the output of find a little. The code below can cope with any character in file names except newlines.
list_files () {
find . -type f -print |
sed 's:^\(.*\)/\(.*\)$:\2/\1/\2:' |
sort
}
source_files="$(cd "$1" && list_files)"
dest_files="$(cd "$2" && list_files)"
join -t / -v 1 <(echo "$source_files") <(echo "$dest_files") |
sed 's:^[^/]*/::'
The list_files function generates a list of file names with paths, and prepends the file name in front of the files, so e.g. /mnt/dvd/some/dir/filename.txt will appear as filename.txt/./some/dir/filename.txt. It then sorts the files.
The join command prints out lines like filename.txt/./some/dir/filename.txt when there is a file called filename.txt in the source hierarchy but not in the destination hierarchy. We finally massage its output a little since we no longer need the filename at the beginning of the line.

Resources