How can I creates array that contains the names of all the files in a folder? - bash

Given a folder (that my script get the of this folder as argument) , how can I creates array that will contain the names of all the files in this folder (and the files that exists at any folder in this folder and the other folder - recursively)?
I tried to do it like that :
#!/bin/bash
function get_all_the_files {
for i in "${1}"/*; do
if [ -d "$i" ]; then
get_all_the_files ${1}
else
if [ -f "${i}" ]; then
arrayNamesOfAllTheFiles=(${arrayNamesOfAllTheFiles[#]} "${i}")
fi
fi
done
}
arrayNamesOfAllTheFiles=()
get_all_the_files folder
declare -p arrayNamesOfAllTheFiles
But it's not working. What is the problem and how can I fix it?

To stick with your design (looping on the files and inserting only the regular files), populating the array at each step, but have Bash perform the recursion via the glob, you can use the following:
# the globstar shell option enables the ** glob pattern for recursion
shopt -s globstar
# the nullglob shell option makes non-matching globs expand to nothing (recommended)
shopt -s nullglob
array=()
for file in /path/to/folder/**; do
if [[ ! -h $file && -f $file ]]; then
array+=( "$file" )
fi
done
With the test [[ ! -h $file && -f $file ]] we test that the file is not a symbolic link and a regular file (without testing that the file is not a symbolic link, you would also have the symbolic links that resolve to a regular file).
You also learned about the array+=( "stuff" ) pattern to append to an array, instead of array=( "${array[#]}" "stuff" ).
Another possibility (with Bash ≥ 4.4 where the -d option of mapfile is implemented) and with GNU find (that supports the -print0 predicate):
mapfile -d '' array < <(find /path/to/folder -type f -print0)

You almost had it right. There is a small typo in the recursive call:
if [ -d "$i" ]; then
get_all_the_files ${1}
else
should be
if [ -d "$i" ]; then
get_all_the_files ${i}
else
I will add that use of arrays like this in bash is very unidiomatic. If you are trying to work with recursive trees of files, its more usual to use tools like find and xargs.
find . -type f -print0 | xargs -0 command-or-script-to-run-on-each-file

Related

glob operator with for loop is stuck

I am trying to traverse all files in /home directory recursively. I want to do some linux command with each file . So, I am making use of for loop as below:
for i in /home/**/*
I have put below statements as start of script as well:
shopt -s globstar
shopt -s nullglob
But its getting stuck in for loop. It might be the problem with handling so many files. If I give some another directory(with less no of files) to for loop loop, then it traverse properly.
What else I can try.
Complete code:
#!/bin/bash
shopt -s globstar
shopt -s nullglob
echo "ggg"
for i in /home/**/*
do
NAME=${i}
echo "It's there." $NAME
if [ -f "$i" ]; then
echo "It's there." $NAME
printf "\n\n"
fi
done
Your code isn't getting stuck. It will just be very, very slow since it needs to build up the list of all files before entering the for loop. The standard alternative is to use find, but you need to be careful about what exactly you want to do. If you want it to behave exactly like your for loop, which means i) ignore hidden files (those whose name starts with .) and ii) follow symlinks, you can do this (assuming GNU find since you are on Linux):
find -L . -type f -not -name '.*' -printf '.\n' | wc -l
That will print a . for each file found, so wc -l will give you the number of files. The -L makes find dereference symlinks and the -not -name '.*' will exclude hidden files.
If you want to iterate over the output and do something to each file, you would need to use this:
find -L . -type f -not -name '.*' -print0 |
while IFS= read -r -d '' file; do
printf -- "FILE: %s\n" "$file"
done
Perhaps this approach may help:
#!/bin/bash
shopt -s globstar
shopt -s nullglob
echo "ggg"
for homedir in /home/*/
do
for i in "$homedir"**
do
NAME=${i}
echo "It's there." "$NAME"
if [ -f "$i" ]; then
echo "It's there." "$NAME"
printf "\n\n"
fi
done
done
Update: Another approach in pure bash might be
#!/bin/bash
shopt -s nullglob
walktree() {
local file
for file in *; do
[[ -L $file ]] && continue
if [[ -f $file ]]; then
# Do something with the file "$PWD/$file"
echo "$PWD/$file"
elif [[ -d $file ]]; then
cd "./$file" || exit
walktree
cd ..
fi
done
}
cd /home || exit
walktree

How to iterate over a directory and display only filename

I would want to iterate over contents of a directory and list only ordinary files.
The path of the directory is given as an user input. The script works if the input is current directory but not with others.
I am aware that this can be done using ls.. but i need to use a for .. in control structure.
#!/bin/bash
echo "Enter the path:"
read path
contents=$(ls $path)
for content in $contents
do
if [ -f $content ];
then
echo $content
fi
done
ls is only returning the file names, not including the path. You need to either:
Change your working directory to the path in question, or
Combine the path with the names for your -f test
Option #2 would just change:
if [ -f $content ];
to:
if [ -f "$path/$content" ];
Note that there are other issues here; ls may make changes to the output that break this, depending on wrapping. If you insist on using ls, you can at least make it (somewhat) safer with:
contents="$(command ls -1F "$path")"
You have two ways of doing this properly:
Either loop through the * pattern and test file type:
#!/usr/bin/env bash
echo "Enter the path:"
read -r path
for file in "$path/"*; do
if [ -f "$file" ]; then
echo "$file"
fi
done
Or using find to iterate a null delimited list of file-names:
#!/usr/bin/env bash
echo "Enter the path:"
read -r path
while IFS= read -r -d '' file; do
echo "$file"
done < <(
find "$path" -maxdepth 1 -type f -print0
)
The second way is preferred since it will properly handle files with special characters and offload the file-type check to the find command.
Use file, set to search for files (-type f) from $path directory:
find "$path" -type f
Here is what you could write:
#!/usr/bin/env bash
path=
while [[ ! $path ]]; do
read -p "Enter path: " path
done
for file in "$path"/*; do
[[ -f $file ]] && printf '%s\n' "$file"
done
If you want to traverse all the subdirectories recursively looking for files, you can use globstar:
shopt -s globstar
for file in "$path"/**; do
printf '%s\n' "$file"
done
In case you are looking for specific files based on one or more patterns or some other condition, you could use the find command to pick those files. See this post:
How to loop through file names returned by find?
Related
When to wrap quotes around a shell variable?
Why you shouldn't parse the output of ls
Is double square brackets [[ ]] preferable over single square brackets [ ] in Bash?

How to change extension of multiple files using bash script

I need a bash script to recursively rename files with blank extensions to append .txt at the end. I found the following script, but I can't figure out how to make it recursive:
#!/bin/sh
for file in *; do
test "${file%.*}" = "$file" && mv "$file" "$file".txt;
done
Thanks.
Thanks.
You can delegate the heavy lifting to find
$ find . -type f ! -name "*.*" -print0 | xargs -0 -I file mv file file.txt
assumption is without extension means without a period in name.
If you don't mind using a recursive function, then you can do it in older Bash versions with:
shopt -s nullglob
function add_extension
{
local -r dir=$1
local path base
for path in "$dir"/* ; do
base=${path##*/}
if [[ -f $path && $base != *.* ]] ; then
mv -- "$path" "$path.txt"
elif [[ -d $path && ! -L $path ]] ; then
add_extension "$path"
fi
done
return 0
}
add_extension .
The mv -- is to protect against paths that begin with a hyphen.

find command with filename coming from bash printf builtin not working

I'm trying to do a script which lists files on a directory and then searchs one by one every file in other directory. For dealing with spaces and special characters like "[" or "]" I'm using $(printf %q "$FILENAME") as input for the find command: find /directory/to/search -type f -name $(printf %q "$FILENAME").
It works like a charm for every filename except in one case: when there's multibyte characters (UTF-8). In that case the output of printf is an external quoted string, i.e.: $'file name with blank spaces and quoted characters in the form of \NNN\NNN', and that string is not being expanded without the $'' quoting, so find searchs for a file with a name including that quote: «$'filename'».
Is there an alternative solution in order to be able to pass to find any kind of filename?
My script is like follows (I know some lines can be deleted, like the "RESNAME="):
#!/bin/bash
if [ -d $1 ] && [ -d $2 ]; then
IFSS=$IFS
IFS=$'\n'
FILES=$(find $1 -type f )
for FILE in $FILES; do
BASEFILE=$(printf '%q' "$(basename "$FILE")")
RES=$(find $2 -type f -name "$BASEFILE" -print )
if [ ${#RES} -gt 1 ]; then
RESNAME=$(printf '%q' "$(basename "$RES")")
else
RESNAME=
fi
if [ "$RESNAME" != "$BASEFILE" ]; then
echo "FILE NOT FOUND: $FILE"
fi
done
else
echo "Directories do not exist"
fi
IFS=$IFSS
As an answer said, I've used associative arrays, but with no luck, maybe I'm not using correctly the arrays, but echoing it (array[#]) returns nothing. This is the script I've written:
#!/bin/bash
if [ -d "$1" ] && [ -d "$2" ]; then
declare -A files
find "$2" -type f -print0 | while read -r -d $'\0' FILE;
do
BN2="$(basename "$FILE")"
files["$BN2"]="$BN2"
done
echo "${files[#]}"
find "$1" -type f -print0 | while read -r -d $'\0' FILE;
do
BN1="$(basename "$FILE")"
if [ "${files["$BN1"]}" != "$BN1" ]; then
echo "File not found: "$BN1""
fi
done
fi
Don't use for loops. First, it is slower. Your find has to complete before the rest of your program can run. Second, it is possible to overload the command line. The enter for command must fit in the command line buffer.
Most importantly of all, for sucks at handling funky file names. You're running conniptions trying to get around this. However:
find $1 -type f -print0 | while read -r -d $'\0' FILE
will work much better. It handles file names -- even file names that contain \n characters. The -print0 tells find to separate file names with the NUL character. The while read -r -d $'\0 FILE will read each file name (separate by the NUL character) into $FILE.
If you put quotes around the file name in the find command, you don't have to worry about special characters in the file names.
Your script is running find once for each file found. If you have 100 files in your first directory, you're running find 100 times.
Do you know about associative (hash) arrays in BASH? You are probably better off using associative arrays. Run find on the first directory, and store those files names in an associative array.
Then, run find (again using the find | while read syntax) for your second directory. For each file you find in the second directory, see if you have a matching entry in your associative array. If you do, you know that file is in both arrays.
Addendum
I've been looking at the find command. It appears there's no real way to prevent it from using pattern matching except through a lot of work (like you were doing with printf. I've tried using the -regex matching and using \Q and \E to remove the special meaning of pattern characters. I haven't been successful.
There comes a time that you need something a bit more powerful and flexible than shell to implement your script, and I believe this is the time.
Perl, Python, and Ruby are three fairly ubiquitous scripting languages found on almost all Unix systems and are available on other non-POSIX platforms (cough! ...Windows!... cough!).
Below is a Perl script that takes two directories, and searches them for matching files. It uses the find command once and uses associative arrays (called hashes in Perl). I key the hash to the name of my file. In the value portion of the hash, I store an array of the directories where I found this file.
I only need to run the find command once per directory. Once that is done, I can print out all the entries in the hash that contain more than one directory.
I know it's not shell, but this is one of the cases where you can spend a lot more time trying to figure out how to get shell to do what you want than its worth.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use File::Find;
use constant DIRECTORIES => qw( dir1 dir2 );
my %files;
#
# Perl version of the find command. You give it a list of
# directories and a subroutine for filtering what you find.
# I am basically rejecting all non-file entires, then pushing
# them into my %files hash as an array.
#
find (
sub {
return unless -f;
$files{$_} = [] if not exists $files{$_};
push #{ $files{$_} }, $File::Find::dir;
}, DIRECTORIES
);
#
# All files are found and in %files hash. I can then go
# through all the entries in my hash, and look for ones
# with more than one directory in the array reference.
# IF there is more than one, the file is located in multiple
# directories, and I print them.
#
for my $file ( sort keys %files ) {
if ( #{ $files{$file} } > 1 ) {
say "File: $file: " . join ", ", #{ $files{$file} };
}
}
Try something like this:
find "$DIR1" -printf "%f\0" | xargs -0 -i find "$DIR2" -name \{\}
How about this one-liner?
find dir1 -type f -exec bash -c 'read < <(find dir2 -name "${1##*/}" -type f)' _ {} \; -printf "File %f is in dir2\n" -o -printf "File %f is not in dir2\n"
Absolutely 100% safe regarding files with funny symbols, newlines and spaces in their name.
How does it work?
find (the main one) will scan through directory dir1 and for each file (-type f) will execute
read < <(find dir2 -name "${1##*/} -type f")
with argument the name of the current file given by the main find. This argument is at position $1. The ${1##*/} removes everything before the last / so that if $1 is path/to/found/file the find statement is:
find dir2 -name "file" -type f
This outputs something if file is found, otherwise has no output. That's what is read by the read bash command. read's exit status is true if it was able to read something, and false if there wasn't anything read (i.e., in case nothing is found). This exit status becomes bash's exit status which becomes -exec's status. If true, the next -printf statement is executed, and if false, the -o -printf part will be executed.
If your dirs are given in variables $dir1 and $dir2 do this, so as to be safe regarding spaces and funny symbols that could occur in $dir2:
find "$dir1" -type f -exec bash -c 'read < <(find "$0" -name "${1##*/}" -type f)' "$dir2" {} \; -printf "File %f is in $dir2\n" -o -printf "File %f is not in $dir2\n"
Regarding efficiency: this is of course not an efficient method at all! the inner find will be executed as many times as there are found files in dir1. This is terrible, especially if the directory tree under dir2 is deep and has many branches (you can rely a little bit on caching, but there are limits!).
Regarding usability: you have fine-grained control on how both find's work and on the output, and it's very easy to add many more tests.
So, hey, tell me how to compare files from two directories? Well, if you agree on loosing a little bit of control, this will be the shortest and most efficient answer:
diff dir1 dir2
Try it, you'll be amazed!
Since you are only using find for its recursive directory following, it will be easier to simply use the globstar option in bash. (You're using associative arrays, so your bash is new enough).
#!/bin/bash
shopt -s globstar
declare -A files
if [[ -d $1 && -d $2 ]]; then
for f in "$2"/**/*; do
[[ -f "$f" ]] || continue
BN2=$(basename "$f")
files["$BN2"]=$BN2
done
echo "${files[#]}"
for f in "$1"/**/*; do
[[ -f "$f" ]] || continue
BN1=$(basename $f)
if [[ ${files[$BN1]} != $BN1 ]]; then
echo "File not found: $BN1"
fi
done
fi
** will match zero or more directories, so $1/**/* will match all the files and directories in $1, all the files and directories in those directories, and so forth all the way down the tree.
If you want to use associative arrays, here's one possibility that will work well with files with all sorts of funny symbols in their names (this script has too much to just show the point, but it is usable as is – just remove the parts you don't want and adapt to your needs):
#!/bin/bash
die() {
printf "%s\n" "$#"
exit 1
}
[[ -n $1 ]] || die "Must give two arguments (none found)"
[[ -n $2 ]] || die "Must give two arguments (only one given)"
dir1=$1
dir2=$2
[[ -d $dir1 ]] || die "$dir1 is not a directory"
[[ -d $dir2 ]] || die "$dir2 is not a directory"
declare -A dir1files
declare -A dir2files
while IFS=$'\0' read -r -d '' file; do
dir1files[${file##*/}]=1
done < <(find "$dir1" -type f -print0)
while IFS=$'\0' read -r -d '' file; do
dir2files[${file##*/}]=1
done < <(find "$dir2" -type f -print0)
# Which files in dir1 are in dir2?
for i in "${!dir1files[#]}"; do
if [[ -n ${dir2files[$i]} ]]; then
printf "File %s is both in %s and in %s\n" "$i" "$dir1" "$dir2"
# Remove it from dir2 has
unset dir2files["$i"]
else
printf "File %s is in %s but not in %s\n" "$i" "$dir1" "$dir2"
fi
done
# Which files in dir2 are not in dir1?
# Since I unset them from dir2files hash table, the only keys remaining
# correspond to files in dir2 but not in dir1
if [[ -n "${!dir2files[#]}" ]]; then
printf "File %s is in %s but not in %s\n" "$dir2" "$dir1" "${!dir2files[#]}"
fi
Remark. The identification of files is only based on their filenames, not their contents.

Recursive Shell Script and file extensions issue

I have a problem with this script. The script is supposed to go trough all the files and all sub-directories and sub-files (recursively). If the file ends with the extension .txt i need to replace a char/word in the text with a new char/word and then copy it into a existing directory. The first argument is the directory i need to start the search, the second is the old char/word, third the new char/word and fourth the directory to copy the files to. The script goes trough the files but only does the replacement and copies the files from the original directory. Here is the script
#!/bin/bash
funk(){
for file in `ls $1`
do
if [ -f $file ]
then
ext=${file##*.}
if [ "$ext" = "txt" ]
then
sed -i "s/$2/$3/g" $file
cp $file $4
fi
elif [ -d $file ]
then
funk $file $2 $3 $4
fi
done
}
if [ $# -lt 4 ]
then
echo "Need more arg"
exit 2;
fi
cw=$1
a=$2
b=$3
od=$4
funk $cw $a $b $od
You're using a lot of bad practices here: lack of quotings, you're parsing the output of ls... all this will break as soon as a filename contains a space of other funny symbol.
You don't need recursion if you either use bash's globstar optional behavior, or find.
Here's a possibility with the former, that will hopefully show you better practices:
#!/bin/bash
shopt -s globstar
shopt -s nullglob
funk() {
local search=${2//\//\\/}
local replace=${3//\//\\/}
for f in "$1"/**.txt; do
sed -i "s/$search/$replace/g" -- "$f"
cp -nvt "$4" -- "$f"
done
}
if (($#!=4)); then
echo >&2 "Need 4 arguments"
exit 1
fi
funk "$#"
The same function funk using find:
#!/bin/bash
funk() {
local search=${2//\//\\/}
local replace=${3//\//\\/}
find "$1" -name '*.txt' -type f -exec sed -i "s/$search/$replace/g" -- {} \; -exec cp -nvt "$4" -- {} \;
}
if (($#!=4)); then
echo >&2 "Need 4 arguments"
exit 1
fi
funk "$#"
In cp I'm using
the -n switch: no clobber, so as to not overwrite an existing file. Use it if your version of mv supports it, unless you actually want to overwrite files.
the -v switch: verbose, will show you the moved files (optional).
the -t switch: -t followed by a directory tells to copy into this directory. It's a very good thing to use cp this way: imagine instead of giving an existing directory, you give an existing file: without this feature, this file will get overwritten several times (well, this will be the case if you omit the -n option)! with this feature the existing file will remain safe.
Also notice the use of --. If your cp and sed supports it (it's the case for GNU sed and cp), use it always! it means end of options now. If you don't use it and if a filename start with a hyphen, it would confuse the command trying to interpret an option. With this --, we're safe to put a filename that may start with a hyphen.
Notice that in the search and replace patterns I replaced all slashes / by their escaped form \/ so as not to clash with the separator in sed if a slash happens to appear in search or replace.
Enjoy!
As pointed out, looping over find output is not a good idea. It also doesn't support slashes in search&replace.
Check gniourf_gniourf's answer.
How about using find for that?
#!/bin/bash
funk () {
local dir=$1; shift
local search=$1; shift
local replace=$1; shift
local dest=$1; shift
mkdir -p "$dest"
for file in `find $dir -name *.txt`; do
sed -i "s/$search/$replace/g" "$file"
cp "$file" "$dest"
done
}
if [[ $# -lt 4 ]] ; then
echo "Need 4 arguments"
exit 2;
fi
funk "$#"
Though you might have files with the same names in the subdirectories, then those will be overwritten. Is that an issue in your case?

Resources