How to copy recursively files with multiple specific extensions in bash - bash

I want to copy all files with specific extensions recursively in bash.
****editing****
I've written the full script. I have list of names in a csv file, I'm iterating through each name in that list, then creating a directory with that same name somewhere else, then I'm searching in my source directory for the directory with that name, inside it there are few files with endings of xlsx,tsv,html,gz and I'm trying to copy all of them into the newly created directory.
sample_list_filepath=/home/lists/papers
destination_path=/home/ds/samples
source_directories_path=/home/papers_final/new
cat $sample_list_filepath/sample_list.csv | while read line
do
echo $line
cd $source_directories_path/$line
cp -r *.{tsv,xlsx,html,gz} $source_directories_path/$line $destination_path
done
This works, but it copies all the files there, with no discrimination for specific extension.
What is the problem?

An easy way to solve your problem is to use find and regex :
find src/ -regex '.*\.\(tsv\|xlsx\|gz\|html\)$' -exec cp {} dest/ \;
find look recursively in the directory you specify (in my example it's src/), allows you to filter with -regex and to apply a command for matching results with -exec
For the regex part :
.*\.
will take the name of the file and the dot before extension,
\(tsv\|xlsx\|gz\|html\)$
verify the extension with those you want.
The exec block is what you do with files you got from regex
-exec cp {} dest/ \;
In this case, you copy what you got ({} meaning) to the destination directory.

Related

How to write a bash script to copy files from one base to another base location

I have a bash script I'm trying to write
I have 2 base directories:
./tmp/serve/
./src/
I want to go through all the directories in ./tmp and copy the *.html files into the same folder path in ./src
i.e
if I have a html file in ./tmp/serve/app/components/help/ help.html -->
copy to ./src/app/components/help/ And recursively do this for all subdirectories in ./tmp/
NOTE: the folder structures should exist so just need to copy them only. If it doesn't then hopefully it could create the folder for me (not what I want) but with GIT I can track these folders to manually handle those loose html files.
I got as far as
echo $(find . -name "*.html")\n
But not sure how to actually extract the file path with pwd and do what I need to, maybe it's not a one liner and needs to be done with some vars.
something like
for i in `echo $(find /tmp/ -name "*.html")\n
do
cp -r $i /src/app/components/help/
done
going so far to create the directories would take some more time for me.
I'll try to do it on my own and see if I come up with something
but for argument sake if you do run pwd and get a response the pseudo code for that:
pwd
get response
if that directory does not exist in src create that directory
copy all the original directories contents into the new folder at /src/$newfolder
(possibly running two for loops, one to check the directory tree, and then one to go through each original directory, copying all the html files)
You process substitution to loop the output from your find command and create the destination directory(ies) and then copy the file(s):
#!/bin/bash
# accept first parameters to script as src_dir and dest values or
# simply use default values if no parameter(s) passed
src_dir=${1:-/tmp/serve}
dest=${2-src}
while read -r orig_path ; do
# To replace the first occurrence of a pattern with a given string,
# use ${parameter/pattern/string}
dest_path="${orig_path/tmp\/serve/${dest}}"
# Use dirname to remove the filename from the destination path
# and create the destination directory.
dest_dir=$(dirname "${dest_path}")
mkdir -p "${dest_dir}"
cp "${orig_path}" "${dest_path}"
done < <(find "${src_dir}" -name '*.html')
This script copy .html files from src directory to des directory (create the subdirectory if they do not exist)
Find the files, then remove the src directory name and copy them into the destination directory.
#!/bin/bash
for i in `echo $(find src/ -name "*.html")`
do
file=$(echo $i | sed 's/src\///g')
cp -r --parents $i des
done
Not sure if you must use bash constructs or not, but here is a GNU tar solution (if you use GNU tar), which IMHO is the best way to handle this situation because all the metadata for the files (permissions, etc.) are preserved:
find ./tmp/serve -name '*.html' -type f -print0 | tar --null -T - -c | tar -x -v -C ./src --strip-components=3
This finds all the .html files (-type f) in the ./tmp/serve directory and prints them nul-terminated (-print0), then sends these filenames via stdin to tar as nul-terminated literals (--null) for inclusion (-T -), creating (-c) an archive which is then sent to another tar instance which extracts (-x) the archive printing its contents along the way (optional: -v), changing directory to the destination (-C ./src) before commencing and stripping (--strip-components=3) the ./tmp/serve/ prefix from the files. (You could also cd ./tmp/serve beforehand, using find . instead, and change -C to ../../src.)

Loop through and unzip directories and then unzip items in subdirectories

I have a folder designed in the following way:
-parentDirectory
---folder1.zip
----item1
-----item1.zip
-----item2.zip
-----item3.zip
---folder2.zip
----item1
-----item1.zip
-----item2.zip
-----item3.zip
---folder3.zip
----item1
-----item1.zip
-----item2.zip
-----item3.zip
I would like to write a bash script that will loop through and unzip the folders and then go into each subdirectory of those folders and unzip the files and name those files a certain way.
I have tried the following
cd parentDirectory
find ./ -name \*.zip -exec unzip {} \;
count=1
for fname in *
do
unzip
mv $fname $attempt{count}.cpp
count=$(($count + 1))
done
I thought the first two lines would go into the parentDirectory folder and unzip all zips in that folder and then the for loop would handle the unzipping and renaming. But instead, it unzipped everything it could and placed it in the parentDirectory. I would like to maintain the same directory structure I have.
Any help would be appreciated
excerpt from man unzip
[-d exdir]
An optional directory to which to extract files. By default, all files and subdirectories are recreated in the current directory; the -d option allows extraction in an arbitrary directory (always assuming one has permission to write to the directory).
It's doing exactly what you told it, and what would happen if you had done the same on the command line. Just tell it where to extract, since you want it to extract there.
see Ubuntu bash script: how to split path by last slash? for an example of splitting the path out of fname.
putting it all together, your command executed in the parentDirectory is
find ./ -name \*.zip -exec unzip {} \;
But you want unzip to extract to the directory where it found the file. I was going to just use backticks on dirname {} but I can't get it to work right, as it either executes on the "{}" literal before find, or never executes.
The easiest workaround was to write my own script for unzip which does it in place.
> cat unzip_in_place
unzip $1 -d `dirname $1`
> find . -name "*.zip" -exec ./unzip_in_place {} \;
You could probably alias unzip to do that automatically, but that is unwise in case you ever use other tools that expect unzip to work as documented.

Can't rename, No such file or directory

I have a root folder (03_COMPLETE), inside which are 40 subfolders two levels down (all called CHILD_PNG) that contain .png files I want to rename. There are 6 complete folders I have to go through, with tens of thousands of files. All files are currently named like this: 123456_lifestyle.png, I want them named to lifestyle_123456.png.
My code:
find . -mindepth 2 -type f -iname '*.png' -print0 | xargs -0 /usr/local/bin/rename -v 's/\/([0-9]+)_([A-Za-z]+[0-9])/\/$2_$1/'\;
If I run this on an individual folder of .png files (without using -mindepth) it renames them. However if I run it on the root 03_COMPLETE directory to try and do all the renaming at once, I get lines of errors like this:
Can't rename
'/Volumes/COMMON-LIC-PHOTO/RETOUCHING/04_DELIVERY_PNG/Computer1/03_COMPLETE/06052017_NYS5_W_1263_Output/CHILD_PNG/123456_lifestyle.png'
to
'/Volumes/COMMON-LIC-PHOTO/RETOUCHING/04_DELIVERY_PNG/Computer1/03_COMPLETE/NYS5_06052017_W_1263_Output/CHILD_PNG/123456_lifestyle.png':
No such file or directory
I think it might have something to do with the names of the folder 1 level down (eg. here NYS5_06052017_W_1263_Output) because it did rename on a couple of folders named Bustform_000. Most of the folders though start with a number like 06052017.
I can't figure out why this will work at the .png folder level but won't work on the root folder, and why it will rename in a few folders but most of them it won't.
Also what is weird is that in the error it says it is trying to rename 123456_lifestyle.png to the same filename. Why would it do that? Any ideas?
This might help:
find 03_COMPLETE -type f | xargs -n 1 rename -n 's|/([^_/]*)_([^_/]*).png$|/$2_$1.png|'
Remove -n if output is okay.
You could change directory into each of the CHILD_PNG directories and run a single rename in there on all the files so you don't exec a new rename for every single file:
find 03_COMPLETE -type d -name CHILD_PNG -execdir bash -c "cd {}; rename -n '...' *.png" \;
The issue with your original Regex is, it matches the directory names of the form "xxxxx_yyyyy" and tries to convert them into "yyyyy_xxxxx", which, of course, doesn't exist. Since you're interested in changing only the filenames, and all of them end with .png, you can use the below Regex. Additionally, as you're trying to match a literal '/', you can choose a different character like '|' as delimiter to make the Regex easier to read
's|/([0-9]+)_([A-Za-z]+[0-9]*)(\.[Pp][Nn][Gg])|/$2_$1$3|'

Copying files in multiple directories in terminal

I am new to using the command line for operations, so forgive me if this question is obvious. I would like to move all files of a certain type (.aiff) from one directory that contains many sub-directories. The file structure looks like this:
directory
- subdir1
-- sound.aiff
-- other.txt
- subdir2
-- sound2.aiff
-- other2.txt
I've tried using something like cp -R /Users/me/directory/*.aiff /Users/me/newdirectory but I get a "no such file or directory" error. I don't know how to specify that the files I want copied in the subdirectories must be .aiff files.
Try this:
cp -R /Users/me/directory/*/*.aiff /Users/me/newdirectory
But probably the destination /Users/me/newdirectory is missing.
You could verify this by doing:
file /Users/me/newdirectory
If the directory doesn't exist will print an error like:
Users/me/newdirectory: cannot open `/Users/me/newdirectory' (No such file or directory)
Create the directory with:
mkdir /Users/me/newdirectory
Next, try to copy the files again, if you want to move them use mv instead of cp
Another way is to use the command find, for example:
find /Users/me/directory -type f -iname "*.aiff" -exec mv {} /Users/me/newdirectory \;
In this example, the command find is going to search for in directory /Users/me/directory/ only for files -type f that end (case insensitive) in *.aiff for each file found it will execute the command mv exec mv {} /Users/me/newdirectory. The {} is a placeholder.
Before moving you could test the command by just finding the desired types:
find . -iname "*.aiff"
This will search for files within the directory the command is executed, notice the . instead of a /Users/me/directory/

How to change filename extensions in subdirectories on Mac OS X

I have been trying to iterate though a directory structure where I want to change the file extension from .mv4 to .mp4.
The problem is that there are spaces in many of the file names, and I have not been successful in iterating the directory structure.
I am doing this in the terminal.
There are examples for changing extensions in a single directory, but not for subdirectories and where the file names have spaces in them.
You can use the -exec argument of "find" to do this:
find . -type f -name "*.mv4" -exec sh -c 'mv "$1" "${1%.mv4}.mp4"' _ {} \;
Issue the "find" command from the base directory of your directory structure containing the mv4 files, or specify that directory in place of the "." right after "find" in the above line of code.
I tested this on my Mac running Yosemite. It works for me with filenames that contain spaces.

Resources