bash finding files in directories recursively

bash finding files in directories recursively - bash

I'm studying the bash shell and lately understood i'm not getting right recursive calls involving file searching- i know find is made for this but I'm recently asked to implement a certain search this way or another.
I wrote the next script:
#!/bin/bash
function rec_search {
for file in `ls $1`; do
echo ${1}/${item}
if[[ -d $item ]]; then
rec ${1}/${item}
fi
done
}
rec $1
the script gets as argument file and looking for it recursively.
i find it a poor solution of mine. and have a few improvement questions:
how to find files that contain spaces in their names
can i efficiently use pwd command for printing out absolute address (i tried so, but unsuccessfully)
every other reasonable improvement of the code

Your script currently cannot work:
The function is defined as rec_search, but then it seems you mistakenly call rec
You need to put a space after the "if" in if[[
There are some other serious issues with it too:
for file in `ls $1` goes against the recommendation to "never parse the output of ls", won't work for paths with spaces or other whitespace characters
You should indent the body of if and for to make it easier to read
The script could be fixed like this:
rec() {
for path; do
echo "$path"
if [[ -d "$path" ]]; then
rec "$path"/*
fi
done
}
But it's best to not reinvent the wheel and use the find command instead.

If you are using bash 4 or later (which is likely unless you running this under Mac OS X), you can use the ** operator.
rec () {
shopt -s globstar
for file in "$1"/**/*; do
echo "$file"
done
}

Related

Issue with wildcards into arguments of a Bash function

From the following link, I tried to use the following solution to compare a group of source files (here fortran90 = *.f90).
To do this and see the source which are different, I have put into my ~/.bashrc :
function diffm { for file in "$1"/"$2"; do diff -qs "$file" "$3"/"${file##*/}"; done ;}
But unfortunately, if I am in the current directory for argument $1, i.e by execute :
$ diffm . '*.f90' ../../dir2
The result is : impossible to access to './*.f90'. However, the sources *.f90 exist but wildcards are not taken into account.
Surely a problem with double quotes on arguments of my function ($1, $2, $3)?
More generally, this function doesn't work well.
How can I modify this function in order to make it work in all cases, even being in the current directory "." for the first argument $1 or the third $3?

If i understood what you are trying to do, this should work
diffm () {
dir="$1"; shift
for file in "$#"; do
filename=$(basename "$file")
diff -qs "$file" "$dir/$filename"
done
}
Usage
diffm ../../dir2 ./*.f90

Filename generation does not occur within quotes. Hence, you pass the literal string *.f90 to the function, and this string is used there literally too. If you know for sure that there is exactly one f90 files in your directory, don't use quotes and write
diffm . *.f90 ../../dir2
Things get ugly if the file name has a space embedded (which, BTW, is one reason why I prefer Zsh over bash - you don't have to care about this in Zsh). To deal with this case, you could do a
myfile=*.f90
diffm . "$myfile" ../../dir2
But sooner or later, you will be bitten by the fact, that for whatever reason, you have more than one f90 file, and your strategy will break. Therefore, a better solution would be to use a loop, which also works for the border case of having only one file:
iterating=0
for myfile in *.f90
do
if ((iterating == 0))
then
((iterating+=1))
diffm . "$myfile" ../../dir2
elif [[ ! -f $myfile ]]
then
echo "No files matching $myfile in this directory"
else
echo "ERROR: More than one f90-file. Don't know which one to diff" 1>&2
fi
done
The elif part just cares for the case that you don't have any f90 files. In this case, the loop body is executed once, and myfile contains the wildcard pattern.

Scripting for file management with a very large amount of files

I have a three OSX machine setup that was using syncthing to keep shared drives synchronized remotely. Someone made some mistakes and a lot of files ended up getting renamed.
So all throughout this drive I have situations where there's a file of size 0KB named,for example, file.jpg and another file with real size named
file.sync-confilct201705-4528.jpg. I need to search the entire drive recursively and while I find a file with the sync-conflict string in it, check to see if there is the same file without the 'sync-conflict' string along with a size of 0KB. If there is, I need to rename the sync-conflict file to overwrite the 0KB file.
I have considered tackling this with a bash script or a Perl script. Using bash I think just using the 'find' command with -regex would get me started but I don't really know how to process the results and run the next find test. I am studying and working on it.
Same problem with Perl. I can get through the first step using File::Find:find and select what I need using regex to filter out the files, but there again I am stuck getting to the next step, which would be finding the original file in the same directory and performing the necessary file move function.
In both of these cases I am willing to put in the time to figure it out, but I wonder what the caveats will be? Can both of these scenarios handle recursing a large number of files without exception? Is there perhaps a better approach anyone can recommend?

One good tool in Perl for this is File::Find::Rule.
Find all sync-conflict files, then test whether corresponding files exist and are zero size
use warnings;
use strict;
use FindBin qw($RealBin);
use File::Copy qw(move);
use File::Find::Rule;
my $dir = shift || '.'; # top of hierarchy to search (from command line, or ./)
my #conflict_files = File::Find::Rule
->file->name('*sync-conflict*.jpg')->in($dir);
foreach my $conflict (#conflict_files)
{
my ($file) = $conflict =~ m|(.*)\.sync-conflict|;
$file .= '.jpg';
if (-z "$RealBin/$file") {
print "Rename $conflict to $file\n"
#move($conflict, $file) or warn "Can't move $conflict to $file: $!";
}
}
This builds the file's name file for each file.sync-conflict file and applies -z file test (-X), which tests for both existence and zero size. Then it renames the file using the core File::Copy.
Note that file-test operators need the full path while File::Find::Rule returns the path relative to the $dir it searches. I use $RealBin provided by FindBin, which is the path to the directory where the script was started with all links resolved, to build the full path for -z.
Uncomment the move line after sufficient testing (and with having made a backup first).
The code makes some assumptions about file names, please adjust as needed.
The $dir supplied on the command line is expected to be relative to the script's directory.

find is great. But as you've noted, you need more.
What find gets you in this scenario is the ability to search recursively and match certain patterns. As it happens as of Bash version 4, you can do that right in the shell.
(Note that macOS ships with bash version 3, so for this solution, you'll need to install bash 4 from Macports, Homebrew or Fink.)
$ shopt -s globstar nullglob
$ for file in **/*sync-confilct2017*.*; do echo mv -v "$file" "${file%sync-conf*}${file##*.}"; done
mv -v file.sync-confilct201705-4528.jpg file.jpg
mv -v foo/bar.sync-confilct201705-4528.ext foo/bar.ext
You can remove the echo to actually run the mv command.
The way this works is that the double asterisk, **, is treated by bash like a * that recurses. We're using parameter expansion to strip the parts of the filename we want in order to construct the "target" filename.

Create a function to fix the name:
$ function fixname() { file="$1"; newname=$( echo "$file" | sed "s/sync-conflict.*\.jpg$/.jpg/" ); if [ -f "$newname" -a ! -s "$newname" ]; then mv "$file" "$newname"; fi; }
Or, spread out a bit:
function fixname() {
file="$1"
newname=$( echo "$file" | sed "s/sync-conflict.*\.jpg$/.jpg/" )
# If empty file exists
if [ -f "$newname" -a ! -s "$newname" ]; then
mv "$file" "$newname"
fi
}
Export the function:
$ export -f fixname
Run find to execute the function:
$ find . -type f -name \*sync-conflict\*.jpg -exec bash -c 'fixname {}' bash \;
Caveat: It will not work with spaces or funky characters in the filenames.

Bash: search up a directory tree

I have a source code tree whose root is at something like /home/me/workspace. There are many subdirectories many levels deep. In particular there is a path containing some tools:
/home/me/workspace/tools/scripts
I am writing a bash function which I can call from any place in the tree to which I pass the string tools/scripts. The function should iterate its way from the present working directory to / looking for path fragment tools/scripts, then if it finds it, print out the absolute path in which it is found. In this example, /home/me/workspace would be printed. If the path fragment is not found at all, then nothing is printed.
I already have the following bash function which does this for me:
search_up ()
(
while [ $PWD != "/" ]; do
if [ -e "$1" ]; then
pwd
break
fi
cd ..
done
)
but this seems a bit long-winded. I am wondering if there are any other ways to do this either in bash itself, or in perhaps a single find command, or any other common utility. I'm particularly looking for readability and brevity.
Note I am not looking for a full recursive search of the entire tree.
Also my bash is not the latest, so please no tricks using the latest, greatest:
$ bash --version
GNU bash, version 3.00.15(1)-release (x86_64-redhat-linux-gnu)
Copyright (C) 2004 Free Software Foundation, Inc.
$

This should work but tell me if it needs compatibility with POSIX. The advantage of this is that you don't need to change your directory to higher level just to make the search, and also no need to use a subshell.
#!/bin/bash
search_up() {
local look=${PWD%/}
while [[ -n $look ]]; do
[[ -e $look/$1 ]] && {
printf '%s\n' "$look"
return
}
look=${look%/*}
done
[[ -e /$1 ]] && echo /
}
search_up "$1"
Example:
bash script.sh /usr/local/bin
Output:
/

For the record, here is what I ended up using:
while [ $PWD != "/" ]; do test -e tools/bin && { pwd; break; }; cd .. ; done
Similar to my OP, but in the end I was able to drop the subshell parentheses () completely, because this line is itself invoked using the "shell" command from another program. Hence also stuffing it all onto one line.
I still liked KonsoleBox's well-reasoned answer as a possibly more general solution, so I'm accepting that.

bash to print certain file names to text

I have spent a lot of time the past few weeks and posting on here. I finally think I am much closer with learning bash but I have one problem with my code I cannot for the life of me figure out why it will not run. I can run each line in the terminal and it returns a result but for some reason when I point it to run, it will do nothing. I get a a syntax error: word unexpected (expecting "do").
#!/bin/bash
image="/Home/Desktop/epubs/images"
for f in $(ls "$image"*.jpg); do
fsize=$(stat --printf= '%s' "$f");
if [ "$fsize" -eq "40318" ]; then
echo "$(basename $f)" >> results.txt
fi
done
What am I missing???

The problem might be in line endings. Make sure your script file has unix line endings, not the Windows ones.
Also, do not iterate over output of ls. Use globbing right in the shell:
for f in "$file"/*.jpg ; do

Your for loop appears to be missing a list of values to iterate over:
image="/Home/Desktop/epubs/images"
for f in $(ls "$image"*.jpg); do
Because $image does not end with a /, your ls command expands to
for f in $(ls /Home/Desktop/epubs/images*.jpg); do
which probably results in
for f in ; do
causing the syntax error. The simplest fix is
for f in $(ls "$image"/*.jpg); do
but you should take the advice in the other answers and skip ls:
for f in "$image"/*.jpg; do

Here's how I would do that.
#!/bin/bash -e
image="/Home/Desktop/epubs/images"
(cd "$image"
for f in *.jpg; do
let fsize=$(stat -c %s "$f")
if (( fsize == 40318 )); then
echo "$f"
fi
done) >results.txt
The -e means the script will exit if anything goes wrong (can't cd into the directory, for instance). Saves a lot of error checking when you're happy with that behavior.
The parentheses mean that the cd command is in a subshell; the surrounding script (including the redirection into results.txt) is still in whatever directory you started in.
Now that we're in the directory, we can just look for *.jpg, no directory prefix, and no need to call basename on anything.
Using let and (( == )) treats the size value as a number instead of a string, so we won't get tripped up by any wonkiness in the way stat chooses to format the value.
We just redirect the output the entire loop into the result file instead of appending every time through; it's more efficient. If you have existing contents in results.txt that you want to keep, you can just change the > back to a >>, but leaving it around the whole loop is still more efficient than opening the file and appending to it on every iteration.

Performance with bash loop when renaming files

Sometimes I need to rename some amount of files, such as add a prefix or remove something.
At first I wrote a python script. It works well, and I want a shell version. Therefore I wrote something like that:
$1 - which directory to list,
$2 - what pattern will be replacement,
$3 - replacement.
echo "usage: dir pattern replacement"
for fname in `ls $1`
do
newName=$(echo $fname | sed "s/^$2/$3/")
echo 'mv' "$1/$fname" "$1/$newName&&"
mv "$1/$fname" "$1/$newName"
done
It works but very slowly, probably because it needs to create a process (here sed and mv) and destroy it and create same process again just to have a different argument. Is that true? If so, how to avoid it, how can I get a faster version?
I thought to offer all processed files a name (using sed to process them at once), but it still needs mv in the loop.
Please tell me, how you guys do it? Thanks. If you find my question hard to understand please be patient, my English is not very good, sorry.
--- update ---
I am sorry for my description. My core question is: "IF we should use some command in loop, will that lower performance?" Because in for i in {1..100000}; do ls 1>/dev/null; done creating and destroying a process will take most of the time. So what I want is "Is there any way to reduce that cost?".
Thanks to kev and S.R.I for giving me a rename solution to rename files.

Every time you call an external binary (ls, sed, mv), bash has to fork itself to exec the command and that takes a big performance hit.
You can do everything you want to do in pure bash 4.X and only need to call mv
pat_rename(){
if [[ ! -d "$1" ]]; then
echo "Error: '$1' is not a valid directory"
return
fi
shopt -s globstar
cd "$1"
for file in **; do
echo "mv $file ${file//$2/$3}"
done
}

Simplest first. What's wrong with rename?
mkdir tstbin
for i in `seq 1 20`
do
touch tstbin/filename$i.txt
done
rename .txt .html tstbin/*.txt
Or are you using an older *nix machine?

To avoid re-executing sed on each file, you could instead setup two name streams, one original, and one transformed, then sip from the ends:
exec 3< <(ls)
exec 4< <(ls | sed 's/from/to/')
IFS=`echo`
while read -u3 orig && read -u4 to; do
mv "${orig}" "${to}";
done;

I think you can store all of file names into a file or string, and use awk and sed do it once instead of one by one.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash finding files in directories recursively - bash

If you are using bash 4 or later (which is likely unless you running this under Mac OS X), you can use the operator. rec () { shopt -s globstar for file in "$1"//*; do echo "$file" done }

Related

Issue with wildcards into arguments of a Bash function

Scripting for file management with a very large amount of files

Bash: search up a directory tree

bash to print certain file names to text

Performance with bash loop when renaming files

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash finding files in directories recursively - bash

If you are using bash 4 or later (which is likely unless you running this under Mac OS X), you can use the ** operator. rec () { shopt -s globstar for file in "$1"/**/*; do echo "$file" done }

Related

Issue with wildcards into arguments of a Bash function

Scripting for file management with a very large amount of files

Bash: search up a directory tree

bash to print certain file names to text

Performance with bash loop when renaming files

Categories

Resources

If you are using bash 4 or later (which is likely unless you running this under Mac OS X), you can use the operator. rec () { shopt -s globstar for file in "$1"//*; do echo "$file" done }