Replace special characters recursively bash - bash

I'm trying to find all files and folders in a certain folder and all sub-folders and replace all special characters. All spaces should be replaced with dots and everything else should just be deleted. I've tried a few different ways but when I use "mv" it doesn't seem to preserve the directory structure and when I use "rename" along with "find" it doesn't want to go recursively.
The closest I've gotten is this:
for f in **/; do mv "$f" `echo $f | tr " " . | tr -dc '[:alnum:].'`; done
But I think the loop is broken somewhere as it adds filenames together and places the result in the parent directory.

You could do:
find . -depth -execdir rename 's/\s/./g; s/[^[:alnum:]./]//g' {} +
A couple of points here:
-depth -- traverse the directory hierarchy depth-first. This ensures that you rename the files in a folder before you rename the folder
-execdir -- executes the command in the subdirectory -- {} will now be ./filename instead of ./dir1/dir2/filename
this is the perl-flavoured rename, check your man page.

Related

Can't rename, No such file or directory

I have a root folder (03_COMPLETE), inside which are 40 subfolders two levels down (all called CHILD_PNG) that contain .png files I want to rename. There are 6 complete folders I have to go through, with tens of thousands of files. All files are currently named like this: 123456_lifestyle.png, I want them named to lifestyle_123456.png.
My code:
find . -mindepth 2 -type f -iname '*.png' -print0 | xargs -0 /usr/local/bin/rename -v 's/\/([0-9]+)_([A-Za-z]+[0-9])/\/$2_$1/'\;
If I run this on an individual folder of .png files (without using -mindepth) it renames them. However if I run it on the root 03_COMPLETE directory to try and do all the renaming at once, I get lines of errors like this:
Can't rename
'/Volumes/COMMON-LIC-PHOTO/RETOUCHING/04_DELIVERY_PNG/Computer1/03_COMPLETE/06052017_NYS5_W_1263_Output/CHILD_PNG/123456_lifestyle.png'
to
'/Volumes/COMMON-LIC-PHOTO/RETOUCHING/04_DELIVERY_PNG/Computer1/03_COMPLETE/NYS5_06052017_W_1263_Output/CHILD_PNG/123456_lifestyle.png':
No such file or directory
I think it might have something to do with the names of the folder 1 level down (eg. here NYS5_06052017_W_1263_Output) because it did rename on a couple of folders named Bustform_000. Most of the folders though start with a number like 06052017.
I can't figure out why this will work at the .png folder level but won't work on the root folder, and why it will rename in a few folders but most of them it won't.
Also what is weird is that in the error it says it is trying to rename 123456_lifestyle.png to the same filename. Why would it do that? Any ideas?
This might help:
find 03_COMPLETE -type f | xargs -n 1 rename -n 's|/([^_/]*)_([^_/]*).png$|/$2_$1.png|'
Remove -n if output is okay.
You could change directory into each of the CHILD_PNG directories and run a single rename in there on all the files so you don't exec a new rename for every single file:
find 03_COMPLETE -type d -name CHILD_PNG -execdir bash -c "cd {}; rename -n '...' *.png" \;
The issue with your original Regex is, it matches the directory names of the form "xxxxx_yyyyy" and tries to convert them into "yyyyy_xxxxx", which, of course, doesn't exist. Since you're interested in changing only the filenames, and all of them end with .png, you can use the below Regex. Additionally, as you're trying to match a literal '/', you can choose a different character like '|' as delimiter to make the Regex easier to read
's|/([0-9]+)_([A-Za-z]+[0-9]*)(\.[Pp][Nn][Gg])|/$2_$1$3|'

Mv files contained in directories to directories/new path

I'm working with macOS Sierra.
I have ~ 1000+ directories with lots of files in it. Word, Excel and Zipped documents in it. Only one sub level. Important : there is spaces in the filenames and in the folder names.
We decided to change the arborescence of the files ; all the files in each directory need to be moved to a subdirectory in it called "Word & Excel" before merging with another directory tree.
I managed to create the Word & Excel directory with this command :
for dir in */; do mkdir -- "$dir/Word & Excel"; done
Basically, I just want to do
for dir in */; do mv $dir/* "./Word & Excel"; done
It is not going to work. I even do not understand if the problem is with the $dir — I need the double quote to avoid the space problem, but the asterisk is not going to work if I work with the double quote... — or with the asterisk.
I tried to get a cleaner version by following a previous answer found on the web to a similar problem, clearing the subfolder of the results (and trying basically to avoid my wildcard problem) :
for dir in */; do mv `ls -A "$dir" | grep -v "Word & Excel"` ./"Word & Excel" | cd ../ ; done
I am completely stuck.
Any idea how to handle this?
This should make it, even on Mac OS X. And yes, find sometimes needs the anchor directory.
while read dir; do
mkdir -p "$dir/Word & Excel"
find "$dir" -maxdepth 1 -type f -exec mv {} "$dir/Word & Excel" \;
done < <(find . -mindepth 1 -maxdepth 1 -type d)
This loops over the sub-directories of the current directory (one sub-level only), for each of them (dir), creates the dir/Word & Excel sub-sub-directory if it does not already exist, finds all regular files immediately inside dir and moves them in the dir/Word & Excel. And it should work even with crazy directory and file names.
This being said, if you could convince your boss not to use unusual file or directory names, you life with bash and the Command Line Interface (CLI) would probably be much easier.
Okay, I will use "subfolder" as my subfolder name.
First, creating subfolder within all the dirs
for dir in $(find -type d | grep "/");do mkdir $dir/subfolder; done
I each of one of those, I created a file. I order to move all files within the dirs to the subfolder, I will do something like:
for dir in $(find -type d | grep -vE 'subfolder' | grep '/');do for file in $(find $dir -type f);do mv $file $dir/subfolder;done ;done
You might want to experiment with --exec in find, but just creating a nested loop was the fastest solution for me.
Let me break it down for you. Here, I try to find all the directories in my path, excluding the subfolder directory and the current one. I could've used -maxdepth 0 with find but since I only had these dirs, it wasnt necessary
for dir in $(find -type d | grep -vE 'subfolder' | grep '/')
Now, in each of those dirs, we try to find all the files (in your case, the zip files and what now).
do for file in $(find $dir -type f)
Now, we just move the found files into the directories from the first loop with the name of the subfolder appended.
do mv $file $dir/subfolder;done ;done
Keep in mind that since the first loop is closed at the very end, it will do the move operation for 1 directory at a time, and for all files in only that directory. Nested loops can be a bit trickier to understand, especially when someone else does them their own way, I know :(

Go into every subdirectory and mass rename files by stripping leading characters

From the current directory I have multiple sub directories:
subdir1/
001myfile001A.txt
002myfile002A.txt
subdir2/
001myfile001B.txt
002myfile002B.txt
where I want to strip every character from the filenames before myfile so I end up with
subdir1/
myfile001A.txt
myfile002A.txt
subdir2/
myfile001B.txt
myfile002B.txt
I have some code to do this...
#!/bin/bash
for d in `find . -type d -maxdepth 1`; do
cd "$d"
for f in `find . "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's/^.*myfile/myfile/')"
done
done
however the newly renamed files end up in the parent directory
i.e.
myfile001A.txt
myfile002A.txt
myfile001B.txt
myfile002B.txt
subdir1/
subdir2/
In which the sub-directories are now empty.
How do I alter my script to rename the files and keep them in their respective sub-directories? As you can see the first loop changes directory to the sub directory so not sure why the files end up getting sent up a directory...
Your script has multiple problems. In the first place, your outer find command doesn't do quite what you expect: it outputs not only each of the subdirectories, but also the search root, ., which is itself a directory. You could have discovered this by running the command manually, among other ways. You don't really need to use find for this, but supposing that you do use it, this would be better:
for d in $(find * -maxdepth 0 -type d); do
Moreover, . is the first result of your original find command, and your problems continue there. Your initial cd is without meaningful effect, because you're just changing to the same directory you're already in. The find command in the inner loop is rooted there, and descends into both subdirectories. The path information for each file you choose to rename is therefore stripped by sed, which is why the results end up in the initial working directory (./subdir1/001myfile001A.txt --> myfile001A.txt). By the time you process the subdirectories, there are no files left in them to rename.
But that's not all: the find command in your inner loop is incorrect. Because you do not specify an option before it, find interprets "*.txt" as designating a second search root, in addition to .. You presumably wanted to use -name "*.txt" to filter the find results; without it, find outputs the name of every file in the tree. Presumably you're suppressing or ignoring the error messages that result.
But supposing that your subdirectories have no subdirectories of their own, as shown, and that you aren't concerned with dotfiles, even this corrected version ...
for f in `find . -name "*.txt"`;
... is an awfully heavyweight way of saying this ...
for f in *.txt;
... or even this ...
for f in *?myfile*.txt;
... the latter of which will avoid attempts to rename any files whose names do not, in fact, change.
Furthermore, launching a sed process for each file name is pretty wasteful and expensive when you could just use bash's built-in substitution feature:
mv "$f" "${f/#*myfile/myfile}"
And you will find also that your working directory gets messed up. The working directory is a characteristic of the overall shell environment, so it does not automatically reset on each loop iteration. You'll need to handle that manually in some way. pushd / popd would do that, as would running the outer loop's body in a subshell.
Overall, this will do the trick:
#!/bin/bash
for d in $(find * -maxdepth 0 -type d); do
pushd "$d"
for f in *.txt; do
mv "$f" "${f/#*myfile/myfile}"
done
popd
done
You can do it without find and sed:
$ for f in */*.txt; do echo mv "$f" "${f/\/*myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
If you remove the echo, it'll actually rename the files.
This uses shell parameter expansion to replace a slash and anything up to myfile with just a slash and myfile.
Notice that this breaks if there is more than one level of subdirectories. In that case, you could use extended pattern matching (enabled with shopt -s extglob) and the globstar shell option (shopt -s globstar):
$ for f in **/*.txt; do echo mv "$f" "${f/\/*([!\/])myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir1/subdir3/001myfile001A.txt subdir1/subdir3/myfile001A.txt
mv subdir1/subdir3/002myfile002A.txt subdir1/subdir3/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
This uses the *([!\/]) pattern ("zero or more characters that are not a forward slash"). The slash has to be escaped in the bracket expression because we're still inside of the pattern part of the ${parameter/pattern/string} expansion.
Maybe you want to use the following command instead:
rename 's#(.*/).*(myfile.*)#$1$2#' subdir*/*
You can use rename -n ... to check the outcome without actually renaming anything.
Regarding your actual question:
The find command from the outer loop returns 3 (!) directories:
.
./subdir1
./subdir2
The unwanted . is the reason why all files end up in the parent directory (that is .). You can exclude . by using the option -mindepth 1.
Unfortunately, this was onyl the reason for the files landing in the wrong place, but not the only problem. Since you already accepted one of the answers, there is no need to list them all.
a slight modification should fix your problem:
#!/bin/bash
for f in `find . -maxdepth 2 -name "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's,[^/]+(myfile),\1,')"
done
note: this sed uses , instead of / as the delimiter.
however, there are much faster ways.
here is with the rename utility, available or easily installed wherever there is bash and perl:
find . -maxdepth 2 -name "*.txt" | rename 's,[^/]+(myfile),/$1,'
here are tests on 1000 files:
for `find`; do mv 9.176s
rename 0.099s
that's 100x as fast.
John Bollinger's accepted answer is twice as fast as the OPs, but 50x as slow as this rename solution:
for|for|mv "$f" "${f//}" 4.316s
also, it won't work if there is a directory with too many items for a shell glob. likewise any answers that use for f in *.txt or for f in */*.txt or find * or rename ... subdir*/*. answers that begin with find ., on the other hand, will also work on directories with any number of items.

Bash: How to control iteration flow/loops?

For going over some recovered data, I am working on a script that recursively goes through folders & files and finally runs file on them, to check if they are likely fully recovered from a certain backup or not. (recovered files play, and are identified as mp3 or other audio, non-working files as ASCII-Text)
For now I would just be satisfied with having it go over my test folder structure, print all folders & corresponding files. (printing them mainly for testing, but also because I would like to log where the script currently is and how far along it is in the end, to verify what has been processed)
I tried using 2 for loops, one for the folders, then one for the files. (so that ideally it would take 1 folder, then list the files in there (or potentially delve into subfolders) and below each folder only give the files in that subfolders, then moving on to the next.
Such as:
Folder1
- File 1
- File 2
-- Subfolder
-- File3
-- File4
Folder2
- File5
However this doesn't seem to work in the ways (such with for loops) that are normally proposed. I got as far as using "find . -type d" for the directories and "find . -type f" or "find * -type f" (so that it doesn't go in to subdirectories) However, when just printing the paths/files in order to check if it ran as I wanted it to, it became obvious that that didn't work.
It always seemed to first print all the directories (first loop) and then all the files (second loop). For keeping track of what it is doing and for making it easier to know what was checked/recovered I would like to do this in a more orderly fashion as explained above.
So is it that I just did something wrong, or is this maybe a general limitation of the for loop in bash?
Another problem that could be related: Although assigning the output of find to an array seemed to work, it wasn't accessible as an array ...
Example for loop:
for folder in '$(find . -type d)' ; do
echo $folder
let foldercounter++
done
Arrays:
folders=("$(find . -type d)")
#As far as I know this should assign the output as an array
#However, it is not really assigned properly somehow as
echo "$folders[1]"
# does not work (quotes necessary for spaces)
A find ... -exec ... solution #H.-Dirk Schmitt was referring to might look something like:
find . -type f -exec sh -c '
case $(file "$1") in
*Audio file*)
echo "$1 is an audio file"
;;
*ASCII text*)
echo "$1 is an ascii text file"
;;
esac
' _ {} ';'
For going over some recovered data, I am working on a script that recursively goes through folders & files and finally runs file on them, to check if they are likely fully recovered from a certain backup or not. (recovered files play, and are identified as mp3 or other audio, non-working files as ASCII-Text)
If you want to run file on every file and directory in the current directory, including its subdirectories and so on, you don't need to use a Bash for-loop, because you can just tell find to run file:
find -exec file '{}' ';'
(The -exec ... ';' option runs the command ... on every matched file or directory, replacing the argument {} with the path to the file.)
If you only want to run file on regular files (not directories), you can specify -type f:
find -type f -exec file '{}' ';'
If you (say) want to just print the names of directories, but run the above on regular files, you can use the -or operator to connect one directive that uses -type d and one that uses -type f:
find -type d -print -or -type f -exec file '{}' ';'
Edited to add: If desired, the effect of the above commands can be achieved in pure Bash (plus the file command, of course), by writing a recursive shell function. For example:
function foo () {
local file
for file in "$1"/* ; do
if [[ -d "$file" ]] ; then
echo "$file"
foo "$file"
else
file "$file"
fi
done
}
foo .
This differs from the find command in that it will sort the files more consistently, and perhaps in gritty details such as handling of dot-files and symbolic links, but is broadly the same, so may be used as a starting-point for further adjustments.

How can I use terminal to copy and rename files from multiple folders?

I have a folder called "week1", and in that folder there are about ten other folders that all contain multiple files, including one called "submit.pdf". I would like to be able to copy all of the "submit.pdf" files into one folder, ideally using Terminal to expedite the process. I've tried cp week1/*/submit.pdf week1/ as well as cp week1/*/*.pdf week1/, but it had only been ending up copying one file. I just realized that it has been writing over each file every time which is why I'm stuck with one...is there anyway I can prevent that from happening?
You don't indicate your OS, but if you're using Gnu cp, you can use cp week1/*/submit.pdf --backup=t week/ to have it (arbitrarily) number files that already exist; but, that won't give you any real way to identify which-is-which.
You could, perhaps, do something like this:
for file in week1/*/submit.pdf; do cp "$file" "${file//\//-}"; done
… which will produce files named something like "week1-subdir-submit.pdf"
For what it's worth, the "${var/s/r}" notation means to take var, but before inserting its value, search for s (\/, meaning /, escaped because of the other special / in that expression), and replace it with r (-), to make the unique filenames.
Edit: There's actually one more / in there, to make it match multiple times, making the syntax:
"${ var / / \/ / - }"
take "var" replace every instance of / with -
find to the rescue! Rule of thumb: If you can list the files you want with find, you can copy them. So try first this:
$ cd your_folder
$ find . -type f -iname 'submit.pdf'
Some notes:
find . means "start finding from the current directory"
-type -f means "only find regular files" (i.e., not directories)
-iname 'submit.pdf' "... with case-insensitive name 'submit.dpf'". You don't need to use 'quotation', but if you want to search using wildcards, you need to. E.g.:
~ foo$ find /usr/lib -iname '*.So*'
/usr/lib/pam/pam_deny.so.2
/usr/lib/pam/pam_env.so.2
/usr/lib/pam/pam_group.so.2
...
If you want to search case-sensitive, just use -name instead of -iname.
When this works, you can copy each file by using the -exec command. exec works by letting you specify a command to use on hits. It will run the command for each file find finds, and put the name of the file in {}. You end the sequence of commands by specifying \;.
So to echo all the files, do this:
$ find . -type f -iname submit.pdf -exec echo Found file {} \;
To copy them one by one:
$ find . -type f -iname submit.pdf -exec cp {} /destination/folder \;
Hope this helps!

Resources