How can I use terminal to copy and rename files from multiple folders? - shell

I have a folder called "week1", and in that folder there are about ten other folders that all contain multiple files, including one called "submit.pdf". I would like to be able to copy all of the "submit.pdf" files into one folder, ideally using Terminal to expedite the process. I've tried cp week1/*/submit.pdf week1/ as well as cp week1/*/*.pdf week1/, but it had only been ending up copying one file. I just realized that it has been writing over each file every time which is why I'm stuck with one...is there anyway I can prevent that from happening?

You don't indicate your OS, but if you're using Gnu cp, you can use cp week1/*/submit.pdf --backup=t week/ to have it (arbitrarily) number files that already exist; but, that won't give you any real way to identify which-is-which.
You could, perhaps, do something like this:
for file in week1/*/submit.pdf; do cp "$file" "${file//\//-}"; done
… which will produce files named something like "week1-subdir-submit.pdf"
For what it's worth, the "${var/s/r}" notation means to take var, but before inserting its value, search for s (\/, meaning /, escaped because of the other special / in that expression), and replace it with r (-), to make the unique filenames.
Edit: There's actually one more / in there, to make it match multiple times, making the syntax:
"${ var / / \/ / - }"
take "var" replace every instance of / with -

find to the rescue! Rule of thumb: If you can list the files you want with find, you can copy them. So try first this:
$ cd your_folder
$ find . -type f -iname 'submit.pdf'
Some notes:
find . means "start finding from the current directory"
-type -f means "only find regular files" (i.e., not directories)
-iname 'submit.pdf' "... with case-insensitive name 'submit.dpf'". You don't need to use 'quotation', but if you want to search using wildcards, you need to. E.g.:
~ foo$ find /usr/lib -iname '*.So*'
/usr/lib/pam/pam_deny.so.2
/usr/lib/pam/pam_env.so.2
/usr/lib/pam/pam_group.so.2
...
If you want to search case-sensitive, just use -name instead of -iname.
When this works, you can copy each file by using the -exec command. exec works by letting you specify a command to use on hits. It will run the command for each file find finds, and put the name of the file in {}. You end the sequence of commands by specifying \;.
So to echo all the files, do this:
$ find . -type f -iname submit.pdf -exec echo Found file {} \;
To copy them one by one:
$ find . -type f -iname submit.pdf -exec cp {} /destination/folder \;
Hope this helps!

Related

Check if file is in a folder with a certain name before proceeding

So, I have this simple script which converts videos in a folder into a format which the R4DS can play.
#!/bin/bash
scr='/home/user/dpgv4/dpgv4.py';mkdir -p 'DPG_DS'
find '../Exports' -name "*1080pnornmain.mp4" -exec python3 "$scr" {} \;
The problem is, some of the videos are invalid and won't play, and I've moved those videos to a different directory inside the Exports folder. What I want to do is check to make sure the files are in a folder called new before running the python script on them, preferably within the find command. The path should look something like this:
../Exports/(anything here)/new/*1080pnornmain.mp4
Please note that (anything here) text does not indicate a single directory, it could be something like foo/bar, foo/b/ar, f/o/o/b/a/r, etc.
You cannot use -name because the search is on the path now. My first solution was:
find ./Exports -path '**/new/*1080pnornmain.mp4' -exec python3 "$scr" {} \;
But, as #dan pointed out in the comments, it is wrong because it uses the globstar wildcard (**) unnecessarily:
This checks if /new/ is somewhere in the preceding path, it doesn't have to be a direct parent.
So, the star is not enough here. Another possibility, using find only, could be this one:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' -exec python3 "$scr" {} \;
This regex matches:
any number of nested folders before new with .*/new
any character (except / to leave out further subpaths) + your filename with [^\/]*1080pnornmain.mp4
Performances could degrade given that it uses regular expressions.
Generally, instead of using the -exec option of the find command, you should opt to passing each line of find output to xargs because of the more efficient thread spawning, like:
find ./Exports -regex '.*/new/[^\/]*1080pnornmain.mp4' | xargs -0 -I '{}' python3 "$scr" '{}'

Renaming multiple files in a nested structure

I have a directory with this structure:
root
|-dir1
| |-pred_20181231.csv
|
|-dir2
| |-pred_20181234.csv
...
|-dir84
|-pred_2018123256.csv
I want to run a command that will rename all the pred_XXX.csv files to pred.csv.
How can I easily achieve that?
I have looked into the rename facility but I do not understand the perl expression syntax.
EDIT: I tried with this code: rename -n 's/\training_*.csv$/\training_history.csv/' *.csv but it did not work
Try with this command:
find root -type f -name "*.csv" -exec perl-rename 's/_\d+(\.csv)/$1/g' '{}' \;
Options used:
-type f to specify file or directory.
-name "*.csv" to only match files with extension csv
-exec\-execdir to execute a command, in this case, perl-rename
's/_\d+(\.csv)/$1/g' search a string like _20181234.csv and replace it with .csv, $1 means first group found.
NOTE
Depending in your S.O. you could use just rename instead of perl-rename.
Use some shell looping:
for file in **/*.csv
do
echo mv "$(dirname "$file")/$(basename "$file")" "$(dirname "$file")/pred.csv"
done
On modern shells ** is a wildcard that matches multiple directories in a hierarchy, an alternative to find, which is a fine solution too. I'm not sure if this should instead be /**/*.csv or /root/**/*.csv based on tree you provided, so I've put echo before the 'mv' to see what it's about to do. After making sure this is going to do what you expect it to do, remove the echo.

Alias for a combination of grep and find is needed

Many times I need to search from a directory and below for a pattern in all files with a specific type. For example, I need to ask grep not to look into files other than *.h, *.cpp or *.c. But if I enter:
grep -r pattern .
it looks into all files. If I enter:
grep -r pattern *.c
it tries *.c files in the current folder (no file in my case) and files in *.c folders (no folder in my case). I want to ask it too look into all folders but only into file with the given type. I think grep is not enough to be used for this purpose. So, I get help from find too, like this:
grep pattern `find . -name '*c'`
First, let me know whether I'm right about getting help from find. Can grep be enough? Second, I prefer to write an alias for bash to be used like this:
mygrep pattern c
to be translated to the same command avoiding usage of ` and ' and be simpler. I tried:
alias mygrep="grep $1 `find . -name '*$2'`"
But it doesn't work and issues an error:
grep: c: No such file or directory
I tried to change it, but I couldn't succeed to a successful alias.
Any idea?
This would be better done as a function than an alias, and using -exec instead of passing the output of find to grep. That output would be subject to word splitting and globbing, so could produce surprising results as is. Instead try:
mygrep () {
find . -name "*$2" -exec grep "$1" {} +
}

Go into every subdirectory and mass rename files by stripping leading characters

From the current directory I have multiple sub directories:
subdir1/
001myfile001A.txt
002myfile002A.txt
subdir2/
001myfile001B.txt
002myfile002B.txt
where I want to strip every character from the filenames before myfile so I end up with
subdir1/
myfile001A.txt
myfile002A.txt
subdir2/
myfile001B.txt
myfile002B.txt
I have some code to do this...
#!/bin/bash
for d in `find . -type d -maxdepth 1`; do
cd "$d"
for f in `find . "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's/^.*myfile/myfile/')"
done
done
however the newly renamed files end up in the parent directory
i.e.
myfile001A.txt
myfile002A.txt
myfile001B.txt
myfile002B.txt
subdir1/
subdir2/
In which the sub-directories are now empty.
How do I alter my script to rename the files and keep them in their respective sub-directories? As you can see the first loop changes directory to the sub directory so not sure why the files end up getting sent up a directory...
Your script has multiple problems. In the first place, your outer find command doesn't do quite what you expect: it outputs not only each of the subdirectories, but also the search root, ., which is itself a directory. You could have discovered this by running the command manually, among other ways. You don't really need to use find for this, but supposing that you do use it, this would be better:
for d in $(find * -maxdepth 0 -type d); do
Moreover, . is the first result of your original find command, and your problems continue there. Your initial cd is without meaningful effect, because you're just changing to the same directory you're already in. The find command in the inner loop is rooted there, and descends into both subdirectories. The path information for each file you choose to rename is therefore stripped by sed, which is why the results end up in the initial working directory (./subdir1/001myfile001A.txt --> myfile001A.txt). By the time you process the subdirectories, there are no files left in them to rename.
But that's not all: the find command in your inner loop is incorrect. Because you do not specify an option before it, find interprets "*.txt" as designating a second search root, in addition to .. You presumably wanted to use -name "*.txt" to filter the find results; without it, find outputs the name of every file in the tree. Presumably you're suppressing or ignoring the error messages that result.
But supposing that your subdirectories have no subdirectories of their own, as shown, and that you aren't concerned with dotfiles, even this corrected version ...
for f in `find . -name "*.txt"`;
... is an awfully heavyweight way of saying this ...
for f in *.txt;
... or even this ...
for f in *?myfile*.txt;
... the latter of which will avoid attempts to rename any files whose names do not, in fact, change.
Furthermore, launching a sed process for each file name is pretty wasteful and expensive when you could just use bash's built-in substitution feature:
mv "$f" "${f/#*myfile/myfile}"
And you will find also that your working directory gets messed up. The working directory is a characteristic of the overall shell environment, so it does not automatically reset on each loop iteration. You'll need to handle that manually in some way. pushd / popd would do that, as would running the outer loop's body in a subshell.
Overall, this will do the trick:
#!/bin/bash
for d in $(find * -maxdepth 0 -type d); do
pushd "$d"
for f in *.txt; do
mv "$f" "${f/#*myfile/myfile}"
done
popd
done
You can do it without find and sed:
$ for f in */*.txt; do echo mv "$f" "${f/\/*myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
If you remove the echo, it'll actually rename the files.
This uses shell parameter expansion to replace a slash and anything up to myfile with just a slash and myfile.
Notice that this breaks if there is more than one level of subdirectories. In that case, you could use extended pattern matching (enabled with shopt -s extglob) and the globstar shell option (shopt -s globstar):
$ for f in **/*.txt; do echo mv "$f" "${f/\/*([!\/])myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir1/subdir3/001myfile001A.txt subdir1/subdir3/myfile001A.txt
mv subdir1/subdir3/002myfile002A.txt subdir1/subdir3/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
This uses the *([!\/]) pattern ("zero or more characters that are not a forward slash"). The slash has to be escaped in the bracket expression because we're still inside of the pattern part of the ${parameter/pattern/string} expansion.
Maybe you want to use the following command instead:
rename 's#(.*/).*(myfile.*)#$1$2#' subdir*/*
You can use rename -n ... to check the outcome without actually renaming anything.
Regarding your actual question:
The find command from the outer loop returns 3 (!) directories:
.
./subdir1
./subdir2
The unwanted . is the reason why all files end up in the parent directory (that is .). You can exclude . by using the option -mindepth 1.
Unfortunately, this was onyl the reason for the files landing in the wrong place, but not the only problem. Since you already accepted one of the answers, there is no need to list them all.
a slight modification should fix your problem:
#!/bin/bash
for f in `find . -maxdepth 2 -name "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's,[^/]+(myfile),\1,')"
done
note: this sed uses , instead of / as the delimiter.
however, there are much faster ways.
here is with the rename utility, available or easily installed wherever there is bash and perl:
find . -maxdepth 2 -name "*.txt" | rename 's,[^/]+(myfile),/$1,'
here are tests on 1000 files:
for `find`; do mv 9.176s
rename 0.099s
that's 100x as fast.
John Bollinger's accepted answer is twice as fast as the OPs, but 50x as slow as this rename solution:
for|for|mv "$f" "${f//}" 4.316s
also, it won't work if there is a directory with too many items for a shell glob. likewise any answers that use for f in *.txt or for f in */*.txt or find * or rename ... subdir*/*. answers that begin with find ., on the other hand, will also work on directories with any number of items.

dealing filenames with shell regex

138.096.000.015.00111-138.096.201.072.38717
138.096.000.015.01008-138.096.201.072.00790
138.096.201.072.00790-138.096.000.015.01008
138.096.201.072.33853-173.194.020.147.00080
138.096.201.072.34293-173.194.034.009.00080
138.096.201.072.38717-138.096.000.015.00111
138.096.201.072.41741-173.194.034.025.00080
138.096.201.072.50612-173.194.034.007.00080
173.194.020.147.00080-138.096.201.072.33853
173.194.034.007.00080-138.096.201.072.50612
173.194.034.009.00080-138.096.201.072.34293
173.194.034.025.00080-138.096.201.072.41741
I have many folders inside which there are many files, the file names are like the above
I want to remove those files with file names having substring "138.096.000"
and sometimes I want to get the list of files with filenames with substring "00080"
To delete files with name containing "138.096.000":
find /root/of/files -type f -name '*138.096.000*' -exec rm {} \;
To list files with names containing "00080":
find /root/of/files -type f -name '*00080*'
rm $(find . -name \*138.096.000\*)
This uses the find command to find the appropriate files. This is executed within a subshell, and the output (the list of files) is used by rm. Note the escaping of the * pattern, since the shell will try and expand * itself.
This assumes you don't have filenames with spaces etc. You may prefer to do something like:
for i in $(find . -name \*138.096.000\*); do
rm $i
done
in this scenario, or even
find . -name \*138.096.000\* | xargs rm
Note that in the loop above you'll execute rm for each file, and the xargs variant will execute rm multiple times (dependin gon the number of files you have - it may only execute once).
However, if you're using zsh then you can simply do:
rm **/*138.096.000*
(I'm assuming your directories aren't named like your files. Note the -f flag as used in Kamil's answer if this is the case)

Resources