Using Bash to replace DOTS or characters in DIRECTORY / SUBDIRECTORY names - bash

I have searched looking for the right solution. I have found some close examples.Bash script to replace spaces in file names
But what I'm looking for is how to replace multiple .dots in current DIRECTORY/SUBDIRECTORY names, then replace multiple .dots in FILENAMES excluding the *.extention "recursively".
This example is close but not right:
find . -maxdepth 2 -name "*.*" -execdir rename 's/./-/g' "{}" \;
another example but not right either:
for f in *; do mv "$f" "${f//./-}"; done
So
dir.dir.dir/dir.dir.dir/file.file.file.ext
Would become
dir-dir-dir/dir-dir-dir/file-file-file.ext

You can assign to a variable and pipe like this:
x="dir.dir.dir/dir.dir.dir/file.file.file.ext"
echo "$(echo "Result: ${x%.*}" | sed 's/\./_/g').${x##*.}"
Result: dir_dir_dir/dir_dir_dir/file_file_file.ext

You have to escape . in regular expressions (such as the ones used for rename, because by default it has the special meaning of "any single character". So the replacement statement at least should be s/\./-/g.
You should not quote {} in find commands.
You will need two find commands, since you want to replace all dots in directory names, but keep the last dot in filenames.
You are searching for filenames which contain spaces (* *). Is that intentional?

Related

How to sort output of the find command in a script

I have the following simple script to find all directories (at a depth of 2) that were added in the last N days...
#!/bin/bash
DAYS_PRIOR=180
DIR='/mydir'
FILES=`find $DIR -mindepth 2 -maxdepth 2 -type d -mtime -$DAYS_PRIOR -printf '%f\\\n'`
echo
echo "Files added in the last $DAYS_PRIOR days:"
echo
echo -e $FILES
echo
To get it to add newlines I had to double-escape the printf and use echo -e. That seems odd to me but it was the only way I could get it to print one directory per line on the output.
Everything works up to this point and I get a list of directories as expected. Now I want to sort the list alphabetically. I tried changing the printf in the find command to...
FILES=`find <xxx> -printf '%f\\\n' | sort`
however this doesn't sort the directory names. Based on other posts I tried the following..
FILES=`find <xxx> -printf %f\\\n | sort -t '\0' | awk -F '\0' '{print $0; print "\\\n"}'`
This is very close but leaves an extra space at the start of each line and seems horribly awkward.
Is there a simple method to add a sort to the original find command?
First: double-quote your variable references! When you use echo -e $FILES, the variable FILE's value gets split into "words" based on whitespace (spaces, tabs, and newlines), and then echo sticks those words back together with spaces between them. This has the effect of converting newlines into spaces. In order to wind up with newlines at the end, you're having to use \n instead of a true newline, and use echo -e to convert it. Just use real newlines, and put double-quotes around the variable reference to avoid all this mess:
FILES=$(find "$DIR" -mindepth 2 -maxdepth 2 -type d -mtime "-$DAYS_PRIOR" -printf '%f\n')
# ...
echo "$FILES"
Note that I put double-quotes around all variable references, since this is almost always a good idea. I also used $( ) instead of backticks -- it's easier to read, and avoids some parsing oddities that backticks have.
Anyway, with this format you're using proper newlines throughout, so piping through sort should work as expected.
BTW, I'd also recommend switching from uppercase variable names to lower- or mixed-case names, since there are a bunch of all-caps names that have special meanings, and if you accidentally use one of them bad things can happen.

Recursively looking for a list of file types

I want to use bash to remove all the files in a directory that aren't in an associative array of file extensions. (i.e. delete all the files in a directory that aren't image files, for example)
This question very clearly answers how to do this for a single file extension, but I'm not sure how to do it for a whole list.
currently I'm doing this
for f in $(find . -type f ! -name '*.png' -and ! -name '*.jpg' ); do rm "$f"; done
but it seems ugly to just add a massive list of "-and -name '*.aaa'" inside the parenthesis for every file type.
Is there a way to pass find an associate array like
declare -A allowedTypes=([*.png]=1 [*.jpg]=1 [*.gif]=1)
or will I just need to add a lot of "-and ! -name ___"?
Thanks!
The whole idea of using find int the first place is not needed. The shell globbing support in bash is sufficient enough for this requirement. The bash shell provides an extended glob support option using which you can get the file names under recursive paths that don't end with the extensions you want to ignore.
The extended option is extglob which needs to be set using the shopt option as below. Additionally you could use couple of options more i.e. nullglob in which an unmatched glob is swept away entirely, replaced with a set of zero words. And globstar that allows to recurse through all the directories
shopt -s extglob nullglob globstar
Now all you need to do is form the glob expression to exclude the files of type *.png, *.jpg and *.gif which you can do as below. We use an array to populate the glob results because when quoted properly and expanded, the filenames with special characters would remain intact
fileList=(**/!(*.jpg|*.gif|*.png))
The option ** is to recurse through the sub-folders and !() is a negate operation to not include any of the file extensions listed inside. Now for printing the actual files, just do
printf '%s\n' "${fileList[#]}"
If your intentions is for example to remove all the files identified, you don't need to store the glob results in the array. One could use the array approach when writing simple shell scripts which need to use the results of the glob. But for a case of deleting the files, you could use the rm command.
At first you could check if the files returned are as expected and once you confirmed you could the rm on the expression. Use ls to see if the files are listed as expected
ls -1 -- **/!(*.jpg|*.gif|*.png)
and now after confirming the files to delete, do rm at your own risk.
rm -- **/!(*.jpg|*.gif|*.png)
Assumption: allowedTypes contains only trusted input and only valid suffixes.
The first snippet supports multi-level suffixes like tar.gz. It uses find, a regular expression and a list of allowed suffixes allowedTypes.
allowedTypes=(png gif jpg)
# keepTypes='png|gif|jpg'
keepTypes="$(echo "${allowedTypes[#]}" | tr ' ' '|')"
find . -type f -regextype awk ! -iregex '(.*).('"$keepTypes"')' -exec echo rm {} \;
If you want to keep your associate array, then you could use the following snippet.
It needs additional work to support multi-level file suffixes.
declare -A allowedTypes=([*.png]=1 [*.jpg]=1 [*.gif]=1)
keepTypes="$(echo "${!allowedTypes[#]}" | tr ' ' '|' | tr -d '.*')"
It would be nice if there would be a way to replace the separators with a built-in tool instead of tr but I found none. ${allowedTypes[#]//\ /test} did not replace the whitespaces between the items.

Why does the glob `*[!t]*` return files whose names contain `t`s?

I have really no idea after reading glob (programming) of the results printed by following command in shell, I'm using (bash) as my shell. Given this directory hierarchy:
/sub2
s.py
t2.txt
nametoname.txt
bees.txt
/sub22
$ echo *[!t]*
bees.txt nametname.txt s.py sub22 t2.txt
In my understanding the arguments to echo will be expanded to match any filenames that don't contain the letter t, but the result was quite the opposite, why?
This command outputs all filenames that contain the letter t:
$ echo *[t]*
nametname.txt t2.txt
In the previous command I just negated [t] to [!t], then in my expectation it should do the opposite of the second command.
This glob:
echo *[!t]*
Will find any filename that at least one non-t character in it.
So, If you have filenames as t, tt, ttt then those filenames won't be listed using this glob.
Solution:
If you want to list filenames that don't have letter t in it then you can use this find command:
find . -maxdepth 1 -mindepth 1 -not -name '*t*'
You may also add -type f for listing files only or -type d for listing directories only.
As other answers have given, *[!t]* returns files with any non-t character.
What they haven't yet provided is a workaround:
shopt -s extglob ## enable extglobs
echo !(*t*) ## list files with names not containing t
See http://wiki.bash-hackers.org/syntax/pattern#extended_pattern_language
! is the standard character for negating a bracket expression. *[!t]* means match zero or more arbitrary characters, followed by anything except a t, followed by zero or more arbitrary characters. In other words, match any file name that contains a character other that t. What it won't match are file names consisting only of ts: t, tt, ttt, etc.
If you only want to match filenames that don't contain any t, see Charles Duffy's answer, as he beat me to it.

What can I do with new lines in file name?

If a file has new line symbol in it, and I use find or ls, it shows as "?" symbol in a file name. But when I use | to do anything more with it, it splits the string and messes up everything. How do I deal with it?
Don't pipe the results of ls at all. It is, as you see, unpredictable. Use find instead:
find -maxdepth 1 -exec command {} \;
{} represents the file name.
Alternatively you can also use glob expressions. The results of a glob expression are not subject of word splitting, meaning it is safe if they contain newlines or spaces:
for file in ./* ; do
command "$file"
done
The results of that commands can then be used in another pipe.

How to avoid using spaces as separators in zsh for-loop?

I'm trying to make a little script to convert some files from a music library.
But If I do something like :
#!/usr/bin/zsh
for x in $(find -name "*.m4a");
do
echo $x;
done
When interpreting in a folder containing :
foo\ bar.m4a
it will return :
foo
bar.m4a
How could I prevent the for loop from interpreting space characters as separators?
I could replace $(find -name "*.m4a") with $(find -name "*.m4a" | sed "s/ /_/g") and then using sed the other way inside the loop, but what if file names/paths already contain underscores (Or other characters I may use instead of underscore)?
Any idea?
You can prevent word splitting of the command substitution by double-quoting it:
#!/usr/bin/zsh
for x in "$(find -name "*.m4a")"
do
echo "$x"
done
Notice that the double quotes inside the command substitution don't conflict with the double quotes outside of it (I'd actually never noticed this before now). You could just as easily use single quotes if you find it more readable, as in "$(find -name '*.m4a')". I usually use single quotes with find primaries anyway.
Quoting inside the loop body is important for the same reason. It will ensure that the value of x is passed as a single argument.
But this is definitely a hybrid, Frankensteinian solution. You'd be better off with either globbing or using find as follows:
find . -name '*.mp3' -exec echo {} \;
but this form is limiting. You can add additional -exec primaries, which will be executed like shell commands that are separated by &&, but you can't pipe from one -exec to another and you can't interact with the shell (e.g. by assigning or expanding parameters).
Use zsh's globbing facilities here instead of find.
for x in **/*.m4a; do
echo "$x"
done
Quoting $x in the body of the loop is optional under the default settings of zsh, but it's not a bad idea to do so anyway.
I found out.
As suggested here : https://askubuntu.com/questions/344407/how-to-read-complete-line-in-for-loop-with-spaces
I may set the IFS (Internal Field Separator) to '\n'.
So this works :
#!/usr/bin/zsh
IFS=$'\n'
for x in $(find -name "*.m4a");
do
echo $x;
done
I hope this could help someone else!

Resources