I am working with a directory that has names of files that are marked for processing and deletion. What I need to do is get the names of all the files and put them into an array and then go through the array and do the work. Problem is, is that KSH88 only handles arrays up to 1024 in size and that there can be more file names in the directory!
I just need to be able to get the total current file names in the directory as looping through and doing everything else is easy, the current part of the script is:
#This is getting the result set and attempting to get the total file names as initalNumber.
integer initialNumber=${#`find $source -path "$source/*" -prune -type f -name "$regex" | sed 's!.*/!!'`[#]}
This is giving me a "Bad Substitution Error" currently. This is my first time working with KSH88 so I am not sure if using the result set as an array is even possible. Any help would be awesome, thanks.
Can't you simply get the number of files without such a complicated approach? e.g.
integer initialNumber=$(ls -l | grep -v ^d | wc -l)
Are you using the array for other purposes? There are better ways for iterating through a list of files. To iterate through a list of files in the current directory, it seems much easier to do this more directly.
e.g.
cd $dirname;
for filename in ABC*DEF??.gz; do
# some action here...
done
Related
A bit lowly a query but here goes:
bash shell script. POSIX, Mint 21
I just want one/any (mp3) file from a directory. As a sample.
In normal execution, a full run, the code would be such
for f in *.mp3 do
#statements
done
This works fine but if I wanted to sample just one file of such an array/glob (?) without looping, how might I do that? I don't care which file, just that it is an mp3 from the directory I am working in.
Should I just start this for-loop and then exit(break) after one statement, or is there a neater way more tailored-for-the-job way?
for f in *.mp3 do
#statement
break
done
Ta (can not believe how dopey I feel asking this one, my forehead will hurt when I see the answers )
Since you are using Linux (Mint) you've got GNU find so one way to get one .mp3 file from the current directory is:
mp3file=$(find . -maxdepth 1 -mindepth 1 -name '*.mp3' -printf '%f' -quit)
-maxdepth 1 -mindepth 1 causes the search to be restricted to one level under the current directory.
-printf '%f' prints just the filename (e.g. foo.mp3). The -print option would print the path to the filename (e.g. ./foo.mp3). That may not matter to you.
-quit causes find to exit as soon as one match is found and printed.
Another option is to use the Bash : (colon) command and $_ (dollar underscore) special variable:
: *.mp3
mp3file=$_
: *.mp3 runs the : command with the list of .mp3 files in the current directory as arguments. The : command ignores its arguments and does nothing.
mp3file=$_ sets the value of the mp3file variable to the last argument supplied to the previous command (:).
The second option should not be used if the number of .mp3 files is large (hundreds or more) because it will find all of the files and sort them by name internally.
In both cases $mp3file should be checked to ensure that it really exists (e.g. [[ -e $mp3file ]]) before using it for anything else, in case there are no .mp3 files in the directory.
I would do it like this in POSIX shell:
mp3file=
for f in *.mp3; do
if [ -f "$f" ]; then
mp3file=$f
break
fi
done
# At this point, the variable mp3file contains a filename which
# represents a regular file (or a symbolic link) with the .mp3
# extension, or empty string if there is no such a file.
The fact that you use
for f in *.mp3 do
suggests to me, that the MP3s are named without to much strange characters in the filename.
In that case, if you really don't care which MP3, you could:
f=$(ls *.mp3|head)
statement
Or, if you want a different one every time:
f=$(ls *.mp3|sort -R | tail -1)
Note: if your filenames get more complicated (including spaces or other special characters), this will not work anymore.
Assuming you don't have spaces in your filenames, (and I don't understand why the collective taboo is against using ls in scripts at all, rather than not having spaces in filenames, personally) then:-
ls *.mp3 | tr ' ' '\n' | sed -n '1p'
i'm trying to count all the .txt files in the folders, the problem is that the main folder has more than one folder and inside everyone of them there are txt files , so in total i want to count the number of txt files . till now i've tried to build such a solution,but of course it's wrong:
#!/bin/bash
counter=0
for i in $(ls /Da) ; do
for j in $(ls i) ; do
$counter=$counter+1
done
done
echo $counter
the error i'm getting is :ls cannot access i ...
the problem is that i don't know how i'm supposed to build the inner for loop as it depends on the external for loop(schema) ?
This can work for you
find . -name "*.txt" | wc -l
In the first part find looks for the *.txt from this folder (.) and its subfolders. In the second part wc counts the returnes lines (-l) of find.
You want to avoid parsing ls and you want to quote your variables.
There is no need for repeated loops, either.
printf 'x\n' /Da/* /Da/*/* | wc -l
depending also on whether you expect the entries in /Da to be all files (in which case /Da/* will suffice), all directories (in which case /Da/*/* alone is enough), or both. Additionally, if you don't want to count directories at all, maybe switch to find /Da -type f -printf 'x\n' or similar.
There is no need to print the file names at all; this avoids getting the wrong result if a file name should ever contain a line feed (touch $'/Da/ick\npoo' to see this in action.)
More generally, a correct nested loop looks like
for i in list of things; do
for j in different items, perhaps involving "$i"; do
things with "$j" and perhaps also "$i"
done
done
i is a variable, so you need to reference it via $, i.e. the second loop should be
for j in $(ls "$i") ; do
I'm trying to write two (edit: shell) scripts and am having some difficulty. I'll explain the purpose and then provide the script and current output.
1: get a list of every file name in a directory recursively. Then search the contents of all files in that directory for each file name. Should return the path, filename, and line number of each occurrence of the particular file name.
2: get a list of every file name in a directory recursively. Then search the contents of all files in the directory for each file name. Should return the path and filename of each file which is NOT found in any of the files in the directories.
I ultimately want to use script 2 to find and delete (actually move them to another directory for archiving) unused files in a website. Then I would want to use script 1 to see each occurrence and filter through any duplicate filenames.
I know I can make script 2 move each file as it is running rather than as a second step, but I want to confirm the script functions correctly before I do any of that. I would modify it after I confirm it is functioning correctly.
I'm currently testing this on an IMBi system in strqsh.
My test folder structure is:
scriptTest
---subDir1
------file4.txt
------file5.txt
------file6.txt
---subDir2
------file1.txt
------file7.txt
------file8.txt
------file9.txt
---file1.txt
---file2.txt
---file3.txt
I have text in some of those files which contains existing file names.
This is my current script 1:
#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d -exec basename {} \;`
for i in $files
do
grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;
done
Right now it functions correctly with exception to providing the path to the file which had a match. Doesn't grep return the file path by default?
I'm a little further away with script 2:
#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d`
for i in $files
do
#split $i on '/' and store into an array
IFS='/' read -a array <<< "$i"
#get last element of the array
echo "${array[-1]}"
#perform a grep similar to script 2 and store it into a variable
filename="grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;"
#Check if the variable has anything in it
if [ $filename = "" ]
#if not then output $i for the full path of the current needle.
then echo $i;
fi
done
I don't know how to split the string $i into an array. I keep getting an error on line 6
001-0059 Syntax error on line 6: token redirection not expected.
I'm planning on trying this on an actual linux distro to see if I get different results.
I appreciate any insight in advanced.
Introduction
This isn't really a full solution, as I'm not 100% sure I understand what you're trying to do. However, the following contain pieces of a solution that you may be able to stitch together to do what you want.
Create Test Harness
cd /tmp
mkdir -p scriptTest/subDir{1,2}
mkdir -p scriptTest/subDir1/file{4,5,6}.txt
mkdir -p scriptTest/subDir2/file{1,8,8}.txt
touch scriptTest/file{1,2,3}.txt
Finding and Deleting Duplicates
In the most general sense, you could use find's -exec flag or a Bash loop to run grep or other comparison on your files. However, if all you're trying to do is remove duplicates, then you might simply be better of using the fdupes or duff utilities to identify (and optionally remove) files with duplicate contents.
For example, given that all the .txt files in the test corpus are zero-length duplicates, consider the following duff and fdupes examples
duff
Duff has more options, but won't delete files for you directly. You'll likely need to use a command like duff -e0 * | xargs -0 rm to delete duplicates. To find duplicates using the default comparisons:
$ duff -r scriptTest/
8 files in cluster 1 (0 bytes, digest da39a3ee5e6b4b0d3255bfef95601890afd80709)
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
fdupes
This utility offers the ability to delete duplicates directly in various ways. One such way is to invoke fdupes . --delete --noprompt once you're confident that you're ready to proceed. However, to find the list of duplicates:
$ fdupes -R scriptTest/
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
Get a List of All Files, Including Non-Duplicates
$ find scriptTest -name \*.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
You could then act on each file with the find's -exec {} + feature, or simply use a grep that supports the --recursive --files-with-matches flags to find files with matching content.
Passing Find Results to a Bash Loop as an Array
Alternatively, if you know for sure that you won't have spaces in the file names, you can also use a Bash array to store the files into a variable you can iterate over in a Bash for-loop. For example:
files=$(find scriptTest -name \*.txt)
for file in "${files[#]}"; do
: # do something with each "$file"
done
Looping like this is often slower, but may provide you with the additional flexibility you need if you're doing something complicated. YMMV.
I have a set of files that I download that contain files that I want to remove. I would like to create a list of some form, the script should support blobbing so I can be pretty aggressive with file removal without getting into the complexities of using regex within the list of files.
I am also stumped in that I put a sleep command within the loop of my script, and that is not getting run after each iteration, but only once at the end of run.
Here is the script
# Get to the place where all the durty work happens
cd /Volumes/Videos
FILES=".DS_Store
*.txt
*.sample
*.sample.*
*.samples"
if [ "$(pwd)" == "/Volumes/Videos" ]; then
echo "You are currently in $(pwd)"
echo "You would not have read the above if this script were operating anywhere else"
# Dekete fikes from list above
for f in "$FILES"
do
echo "Removing $f";
rm -f "$f";
echo "$f has been deleted";
sleep 10;
echo "";
echo "";
done
# See if dir is empty, ask if we want to delete it or keep it
# Iterate evert movie file, see if we want to nuke contents. Maybe use part of last openned to help find those files fast
else
# Not in the correct directory
echo "This script is trying to alter files in a location that it should not be working"
echo "Script is currently trying to work in $(pwd)"
exit 1
fi
The main thing that has be completely stumped is the sleep command. It is run once, not once per file iteration. If I have 100 files to go through I get 10 seconds of sleep, not 100*10.
I will be adding in some other features, like if a file is smaller than x bytes, go ahead and delete it too. These files will have spaces and other odd characters in the filenames, am I creating my variables correctly to make this script handle those scenarios as well as be as POSIX compliant as possible. I will change the shebang to sh over bash and try to add in set -o noun set and set -o err exit though I tend to have a lot of trouble when I do that.
Is there a better form of list I should be using? I am not objectionable to storing the pattern match list in a separate file. I can include it, or read it in with any of a few commands.
These are also nested files, a dir, that contains files, or a dir that contains a dir that contains some files. Something like this:
/Volumes/Videos:
The Great guy in a tree
The Great guy in a tree S01e01
sample.avi
readme.txt
The Great guy in a tree S01e01.mpg
The Great guy in a tree S01e02
The Great guy in a tree S01e02.mpg
The Great guy in a tree S01e03
The Great guy in a tree S01e03.mpg
The Great guy in a tree S01e04
The Great guy in a tree S01e04.mpg
Thank you.
The reason that your script is not working as you expect is because your for loop is written incorrectly. This example shows what is going on:
$ i=0
$ FILES=".DS_Store
*.txt
*.sample
*.sample.*
*.samples"
$ for f in "$FILES"; do echo $((++i)) "$f"; done
1 .DS_Store
*.txt
*.sample
*.sample.*
*.samples
Note that only one number is output, indicating that the loop is only going around once. Also, no pathname expansion has occurred.
In order to make your script work as you expect, you can remove the quotes around "$FILES". This means that each word in your string will be evaluated separately, rather than all at once. It also means that pathname expansion of the wildcards that you are using will occur, so all files ending in .txt will be removed, which I guess is what you meant.
Instead of using a string to store your list of expressions, you might prefer to make use of an array:
FILES=( '.DS_Store' '*.txt' '*.sample' '*.sample.*' '*.samples' )
The quotes around each element prevent expansion (so the array only has 5 elements, not the fully expanded list). You could then change your loop to for f in ${FILES[#]} (again, no double quotes results in each element of the list being expanded).
Although removing the quotes fixes your script, I would agree with #hek2mgl's suggestion of using find. It allows you to find files by name, size, date modified and a lot more in one line. If you want to pause between the deletion of each file, you could use something like this:
find \( -name "*.sample" -o -name "*.txt" \) -delete -exec sleep 10 \;
You can use find:
find -type f -name '.DS_Store' -o -name '*.txt' -o -name '*.sample.*' -o -name '*.samples' -delete
I'm after a little help with some Bash scripting (on OSX). I want to create a script that takes two parameters - source folder and target folder - and checks all files in the source hierarchy to see whether or not they exist in the target hierarchy. i.e. Given a data DVD check whether the files contained on it are already on the internal drive.
What I've come up with so far is
#!/bin/bash
if [ $# -ne 2 ]
then
echo "Usage is command sourcedir targetdir"
exit 0
fi
source="$1"
target="$2"
for f in "$( find $source -type f -name '*' -print )"
do
I'm now not sure how it's best to obtain the filename without its path and then see if it exists. I am really a beginner at scripting.
Edit: The answers given so far are all very efficient in terms of compact code. However I need to be able to look for files found within the total source hierarchy anywhere within the target hierarchy. If found I would like to compare checksums and last modified dates etc and comment or, if not found, I would like to note this. The purpose is to check whether files on external media have been uploaded to a file server.
This should give you some ideas:
#!/bin/bash
DIR1="tmpa"
DIR2="tmpb"
function sorted_contents
{
cd "$1"
find . -type f | sort
}
DIR1_CONTENTS=$(sorted_contents "$DIR1")
DIR2_CONTENTS=$(sorted_contents "$DIR2")
diff -y <(echo "$DIR1_CONTENTS") <(echo "$DIR2_CONTENTS")
In my test directories, the output was:
[user#host so]$ ./dirdiff.sh
./address-book.dat ./address-book.dat
./passwords.txt ./passwords.txt
./some-song.mp3 <
./the-holy-grail.info ./the-holy-grail.info
> ./victory.wav
./zzz.wad ./zzz.wad
If its not clear, "some-song.mp3" was only in the first directory while "victory.wav" was only in the second. The rest of the files were common.
Note that this only compares the file names, not the contents. If you like where this is headed, you could play with the diff options (maybe --suppress-common-lines if you want cleaner output).
But this is probably how I'd approach it -- offload a lot of the work onto diff.
EDIT: I should also point out that something as simple as:
[user#host so]$ diff tmpa tmpb
would also work:
Only in tmpa: some-song.mp3
Only in tmpb: victory.wav
... but not feel as satisfying as writing a script yourself. :-)
To list only files in $source_dir that do not exist in $target_dir:
comm -23 <(cd "$source_dir" && find .|sort) <(cd "$target_dir" && find .|sort)
You can limit it to just regular files with -f on the find commands, etc.
The comm command (short for "common") finds lines in common between two text files and outputs three columns: lines only in the first file, lines only in the second file, and lines common to both. The numbers suppress the corresponding column, so the output of comm -23 is only the lines from the first file that don't appear in the second.
The process substitution syntax <(command) is replaced by the pathname to a named pipe connected to the output of the given command, which lets you use a "pipe" anywhere you could put a filename, instead of only stdin and stdout.
The commands in this case generate lists of files under the two directories - the cd makes the output relative to the directories being compared, so that corresponding files come out as identical strings, and the sort ensures that comm won't be confused by the same files listed in different order in the two folders.
A few remarks about the line for f in "$( find $source -type f -name '*' -print )":
Make that "$source". Always use double quotes around variable substitutions. Otherwise the result is split into words that are treated as wildcard patterns (a historical oddity in the shell parsing rules); in particular, this would fail if the value of the variable contain spaces.
You can't iterate over the output of find that way. Because of the double quotes, there would be a single iteration through the loop, with $f containing the complete output from find. Without double quotes, file names containing spaces and other special characters would trip the script.
-name '*' is a no-op, it matches everything.
As far as I understand, you want to look for files by name independently of their location, i.e. you consider /dvd/path/to/somefile to be a match to /internal-drive/different/path-to/somefile. So make an list of files on each side indexed by name. You can do this by massaging the output of find a little. The code below can cope with any character in file names except newlines.
list_files () {
find . -type f -print |
sed 's:^\(.*\)/\(.*\)$:\2/\1/\2:' |
sort
}
source_files="$(cd "$1" && list_files)"
dest_files="$(cd "$2" && list_files)"
join -t / -v 1 <(echo "$source_files") <(echo "$dest_files") |
sed 's:^[^/]*/::'
The list_files function generates a list of file names with paths, and prepends the file name in front of the files, so e.g. /mnt/dvd/some/dir/filename.txt will appear as filename.txt/./some/dir/filename.txt. It then sorts the files.
The join command prints out lines like filename.txt/./some/dir/filename.txt when there is a file called filename.txt in the source hierarchy but not in the destination hierarchy. We finally massage its output a little since we no longer need the filename at the beginning of the line.