How to remove intermediate folders containing only one folder each? - bash

I had been playing around with mv, and now I have a situation.
Earlier, say
Folder1 had file1,2,3.
Now Folder1 has Folder2 which has Folder3 which has Folder4 which contains file1,2,3.
I am trying to write a bash script such that it identifies intermediate folders containing only 1 directory and moves all its contents up one level, ultimately giving back only Folder1->file1,2,3, and rest folders deleted.
I tried to write something like below, but I am :
1.unable to distinguish between file and folder
2.unable to find the file/directory name stored inside current folder
3.Not sure how to do recursively.
#!/bin/bash
echo "Directory Name?"
read dir_name
no_files=`ls -A| wc -l`
if [ $no_file==1 ] && [ itisaDirectory()];
then `mv folder_name/* dir_name`
fi

When you do not care for error messages and want to move all files in subdirs to the current dir and remove the remaining empty dir, do something like
find . -type f -exec mv {} "${dir_name}" \; 2>/dev/null
rm -r */
You ask for something else, only move files where an intermediate directory is unique. That is the case if exactly one subdir has that dir as a parent. The parent of a dir can be found with dirname.
When a dir has one subdir, only one subdir will have it as a parent. You can list all dirs, look for the parent and select the unique paths.
find . -type d -exec dirname {} \; | sort | uniq -u | while read dir; do
echo "${dir} has exactly one subdir"
done
The problem is that the dir can have files as well. We try to improve the above solution:
find . -exec dirname {} \; | sort | uniq -u | while read dir; do
echo "${dir} has exactly one subdir or one file"
done
You can test the content of the dir with if [ -d "${dir}/*" ] but I do not need to know:
find . -exec dirname {} \; | sort | uniq -u | while read dir; do
echo "${dir} has exactly one subdir or one file"
find "${dir}"/*/ -type f -exec mv {} "${dir_name}" \; 2>/dev/null
done
The path ${dir}/*/ will only exist when ${dir} has a subdirectory in it, and will move the files beneath. When $dir only has one file, the find command will find nothing.

Related

Faster way to list files with similar names (using bash)?

I have a directory with more than 20K files all with a random number prefix (eg 12345--name.jpg). I want to find files with similar names and remove all but one. I don't care which one because they are duplicates.
To find duplicated names I've use
find . -type f \( -name "*.jpg" \) | | sed -e 's/^[0-9]*--//g' | sort | uniq -d
as the list of a for/next loop.
To find all but one to delete, I'm currently using
rm $(ls -1 *name.jpg | tail -n +2)
This operation is pretty slow. I want to speed this up. Any suggestions?
I would do it like this.
*Note that you are dealing with rm command, so make sure that you have backup of the existing directory in case something goes south.
Create a backup directory and take backup of existing files. Once done check if all the files are there.
mkdir bkp_dir;cp *.jpg /bkp_dir
Create another temp directory where we will keep all only 1 file for each similar name. So all unique file names will be here.
$ mkdir tmp
$ for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
*Explanation of the command is at the last. Once executed, check in /tmp directory if you got unique instances of the files.
Remove all *.jpg files from main directory. Saying again, please verify that all files have been backed up before executing rm command.
rm *.jpg
Backup the unique instances from the temp directory.
cp tmp/*.jpg .
Explanation of command in step 2.
Command to get unique file names for step 2 will be
for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
$(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq) will get the unique file names like file1.jpg , file2.jpg
for i in $(...);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done will copy one file for each filename to tmp/ directory.
You should not be using ls in scripts and there is no reason to use a separate file list like in userunknown's reply.
keepone () {
shift
rm "$#"
}
keepone *name.jpg
If you are running find to identify the files you want to isolate anyway, traversing the directory twice is inefficient. Filter the output from find directly.
find . -type f -name "*.jpg" |
awk '{ f=$0; sub(/^[0-9]*--/, "", f); if (a[f]++) print }' |
xargs echo rm
Take out the echo if the results look like what you expect.
As an aside, the /g flag to sed is useless for a regex which can only match once. The flag says to replace all occurrences on a line instead of the first occurrence on a line, but if there can be only one, the first is equivalent to all.
Assuming no subdirectories and no whitespace-in-filenames involved:
find . -type f -name "*.jpg" | sed -e 's/^[0-9]*--//' | sort | uniq -d > namelist
removebutone () { shift; echo rm "$#"; }; cat namelist | while read n; do removebutone "*--$n"; done
or, better readable:
removebutone () {
shift
echo rm "$#"
}
cat namelist | while read n; do removebutone "*--$n"; done
Shift takes the first parameter from $* off.
Note that the parens around the name parmeter are superflous, and that there shouldn't be two pipes before sed. Maybe you had something else there, which needed to be covered.
If it looks promising, you have, of course, to remove the 'echo' in front of 'rm'.

Copy multiple files from one directory to multiple other directories

I have a directory structure
Dir_1
Dir_2
Dir_3
Source
. The directory Source contains the files File_1.txt and File_2.txt.
I want to copy all the files from the directory Source to all the remaining directories, in this case Dir_1, Dir_2 and Dir_3.
For this, I used
for i in $(ls -d */ | grep -v 'Source'); do echo $i | xargs -n 1 cp ./Source/*; done
. I, however, keep getting the message
cp: target ‘5’ is not a directory
It seems cp has problems with the directory names which have spaces in them. How do I resolve this (keeping the spaces in the directory names, obviously)?
Using find you could do something like this:
find . -mindepth 1 -maxdepth 1 -type d ! -name Source -exec cp Source/*.txt {} \;
This command searches the current directory for all subdirectories one level deep, excluding Source and then copies the text files into each.
Hope this helps :)

command line find first file in a directory

My directory structure is as follows
Directory1\file1.jpg
\file2.jpg
\file3.jpg
Directory2\anotherfile1.jpg
\anotherfile2.jpg
\anotherfile3.jpg
Directory3\yetanotherfile1.jpg
\yetanotherfile2.jpg
\yetanotherfile3.jpg
I'm trying to use the command line in a bash shell on ubuntu to take the first file from each directory and rename it to the directory name and move it up one level so it sits alongside the directory.
In the above example:
file1.jpg would be renamed to Directory1.jpg and placed alongside the folder Directory1
anotherfile1.jpg would be renamed to Directory2.jpg and placed alongside the folder Directory2
yetanotherfile1.jpg would be renamed to Directory3.jpg and placed alongside the folder Directory3
I've tried using:
find . -name "*.jpg"
but it does not list the files in sequential order (I need the first file).
This line:
find . -name "*.jpg" -type f -exec ls "{}" +;
lists the files in the correct order but how do I pick just the first file in each directory and move it up one level?
Any help would be appreciated!
Edit: When I refer to the first file what I mean is each jpg is numbered from 0 to however many files in that folder - for example: file1, file2...... file34, file35 etc... Another thing to mention is the format of the files is random, so the numbering might start at 0 or 1a or 1b etc...
You can go inside each dir and run:
$ mv `ls | head -n 1` ..
If first means whatever the shell glob finds first (lexical, but probably affected by LC_COLLATE), then this should work:
for dir in */; do
for file in "$dir"*.jpg; do
echo mv "$file" "${file%/*}.jpg" # If it does what you want, remove the echo
break 1
done
done
Proof of concept:
$ mkdir dir{1,2,3} && touch dir{1,2,3}/file{1,2,3}.jpg
$ for dir in */; do for file in "$dir"*.jpg; do echo mv "$file" "${file%/*}.jpg"; break 1; done; done
mv dir1/file1.jpg dir1.jpg
mv dir2/file1.jpg dir2.jpg
mv dir3/file1.jpg dir3.jpg
Look for all first level directories, identify first file in this directory and then move it one level up
find . -type d \! -name . -prune | while read d; do
f=$(ls $d | head -1)
mv $d/$f .
done
Building on the top answer, here is a general use bash function that simply returns the first path that resolves to a file within the given directory:
getFirstFile() {
for dir in "$1"; do
for file in "$dir"*; do
if [ -f "$file" ]; then
echo "$file"
break 1
fi
done
done
}
Usage:
# don't forget the trailing slash
getFirstFile ~/documents/
NOTE: it will silently return nothing if you pass it an invalid path.

Recursively move files of certain type and keep their directory structure

I have a directory which contains multiple sub-directories with mov and jpg files.
/dir/
/subdir-a/ # contains a-1.jpg, a-2.jpg, a-1.mov
/subdir-b/ # contains b-1.mov
/subdir-c/ # contains c-1.jpg
/subdir-d/ # contains d-1.mov
... # more directories with the same pattern
I need to find a way using command-line tools (on Mac OSX, ideally) to move all the mov files to a new location. However, one requirement is to keep directory structure i.e.:
/dir/
/subdir-a/ # contains a-1.mov
/subdir-b/ # contains b-1.mov
# NOTE: subdir-c isn't copied because it doesn't have mov files
/subdir-d/ # contains d-1.mov
...
I am familiar with find, grep, and xargs but wasn't sure how to solve this issue. Thank you very much beforehand!
It depends slightly on your O/S and, more particularly, on the facilities in your version of tar and whether you have the command cpio. It also depends a bit on whether you have newlines (in particular) in your file names; most people don't.
Option #1
cd /old-dir
find . -name '*.mov' -print | cpio -pvdumB /new-dir
Option #2
find . -name '*.mov' -print | tar -c -f - -T - |
(cd /new-dir; tar -xf -)
The cpio command has a pass-through (copy) mode which does exactly what you want given a list of file names, one per line, on its standard input.
Some versions of the tar command have an option to read the list of file names, one per line, from standard input; on MacOS X, that option is -T - (where the lone - means 'standard input'). For the first tar command, the option -f - means (in the context of writing an archive with -c, write to standard output); in the second tar command, the -x option means that the -f - means 'read from standard input'.
There may be other options; look at the manual page or help output of tar rather carefully.
This process copies the files rather than moving them. The second half of the operation would be:
find . -name '*.mov' -exec rm -f {} +
ASSERT: No files have newline characters in them. Spaces, however, are AOK.
# TEST FIRST: CREATION OF FOLDERS
find . -type f -iname \*.mov -printf '%h\n' | sort | uniq | xargs -n 1 -d '\n' -I '{}' echo mkdir -vp "/TARGET_FOLDER_ROOT/{}"
# EXECUTE CREATION OF EMPTY TARGET FOLDERS
find . -type f -iname \*.mov -printf '%h\n' | sort | uniq | xargs -n 1 -d '\n' -I '{}' mkdir -vp "/TARGET_FOLDER_ROOT/{}"
# TEST FIRST: REVIEW FILES TO BE MOVED
find . -type f -iname \*.mov -exec echo mv {} /TARGET_FOLDER_ROOT/{} \;
# EXECUTE MOVE FILES
find . -type f -iname \*.mov -exec mv {} /TARGET_FOLDER_ROOT/{} \;
Being large files, if they are on the same file system you don't want to copy them, but just to replicate their directory structure while moving.
You can use this function:
# moves a file (or folder) preserving its folder structure (relative to source path)
# usage: move_keep_path source destination
move_keep_path () {
# create directories up to one level up
mkdir -p "`dirname "$2"`"
mv "$1" "$2"
}
Or, adding support to merging existing directories:
# moves a file (or folder) preserving its folder structure (relative to source path)
# usage: move_keep_path source destination
move_keep_path () {
# create directories up to one level up
mkdir -p "`dirname "$2"`"
if [[ -d "$1" && -d "$2" ]]; then
# merge existing folder
find "$1" -depth 1 | while read file; do
# call recursively for all files inside
mv_merge "$file" "$2/`basename "$file"`"
done
# remove after merge
rmdir "$1"
else
# either file or non-existing folder
mv "$1" "$2"
fi
}
It is easier to just copy the files like:
cp --parents some/folder/*/*.mov new_folder/
from the parent directory of "dir execute this:
find ./dir -name "*.mov" | xargs tar cif mov.tar
Then cd to the directory you want to move the files to and execute this:
tar xvf /path/to/parent/directory/of"dir"/mov.tar
This should work if you want to move all mov files to a directory called new location -
find ./dir -iname '*.mov' -exec mv '{}' ./newlocation \;
However, if you wish to move the mov files along with their sub-dirs then you can do something like this -
Step 1: Copy entire structure of /dir to a new location using cp
cp -iprv dir/ newdir
Step 2: Find jpg files from newdir and delete them.
find ./newdir -iname "*.jpg" -delete
Test:
[jaypal:~/Temp] ls -R a
a.mov aa b.mov
a/aa:
aaa c.mov d.mov
a/aa/aaa:
e.mov f.mov
[jaypal:~/Temp] mkdir d
[jaypal:~/Temp] find ./a -iname '*.mov' -exec mv '{}' ./d \;
[jaypal:~/Temp] ls -R d
a.mov b.mov c.mov d.mov e.mov f.mov
I amended the function of #djjeck, because it didn't work as I needed. The function below moves a source file to a destination directory also creating the needed levels of hierarchy in the source file path (see the example below):
# moves a file, creates needed levels of hierarchy in destination
# usage: move_with_hierarchy source_file destination top_level_directory
move_with_hierarchy () {
path_tail=$(dirname $(realpath --relative-to="$3" "$1"))
cd "$2"
mkdir -p $path_tail
cd - > /dev/null
mv "$1" "${2}/${path_tail}"
}
example:
$ ls /home/sergei/tmp/dir1/dir2/bla.txt
/home/sergei/tmp/dir1/dir2/bla.txt
$ rm -rf tmp2
$ mkdir tmp2
$ move_with_hierarchy /home/sergei/tmp/dir1/dir2/bla.txt /home/sergei/tmp2 /home/sergei/tmp
$ tree ~/tmp2
/home/sergei/tmp2
└── dir1
└── dir2
└── bla.txt
2 directories, 1 file

Shell script to create directories

I'm trying to create a simple shell script for recursively creating directories inside a list of directories.
I have the next file structure:
A directory called v_79, containing a list of "dirs" (from dir_0 to dir_210), and inside each of them there are several directories called ENSG00000??????, where '?' stands for a character between [0-9].
I would like to create a directory called "my_dir" inside every one of the ENSG00000????? dirs.
I know how to create a directory once being inside each of the dir_XX 's,
for i in ENSG00000??????; do mkdir $i/my_dir; done
but I don't know how to create the directory that I need, in the v_79 directory.
If current dir is v_79, you can use a combination of find and xargs:
find . -name 'ENSG00000......' -type d | xargs -I DIR mkdir DIR/my_dir
if your current directory contains directory "v_79", then
for dir in v_79/dir_{0..210}/ENSG00000??????; do mkdir $dir/my_dir; done
I wonder if that might give you an "argument list too long" error, in which case find is the way to go.
mkdir -p v_79/dir{0,1}{1,2,3}
will create the directories v79/dir01, v79/dir02, v79/dir03, v79/dir11, v79/dir12 and v79/dir13 even if v_79 does not exist.
The -p options will create all required directories recursively.
You can do so from your v_79 directory:
for i in `find . -type d -name "ENSG00000??????"`; do mkdir $i/my_dir; done
this is for dry run - if satisfied, delete the echo before mkdir
echo ./v_79/**/ENSG* | xargs -I% echo mkdir %/my_dir #or
echo ./v_79/**/dir_*/ENSG* | xargs -I% echo mkdir %/my_dir
you need for this bash4 and "shopt -s globstar" (e.g. in your profile)
If you have too much directories, you may get "argument list too long" error (for the 1st echo). In this case the the solution with the find is better
find v_79 -type d -print | grep '/ENSG' | xargs -I% echo mkdir %/my_dir
find all directories in v_79
filter out only these with name ENSG (you can add more "filters")
run (echo) mkdir for the result
is somewhere in the path can be space, modify the above with:
find v_79 -type d -print0 | grep -z '/ENSG' | xargs -0 -I% echo mkdir %/my_dir
Also, you can limit the depth of the find command, e.g.:
find v_79 -depth 2 -type d -print0 | grep -z '/ENSG' | xargs -0 -I% echo mkdir %/my_dir
again, all above is for the dry run - remove the echo for the run. ;)
Just add the -p option, then your work will done.
BTW: -p option for mkdir command means "no error if existing, make parent directories as needed"
You want
mkdir v_79/dir_{0,1,2}{,0,1,2,3,4,5,6,7,8,9}{,0,1,2,3,4,5,6,7,8,9}/ENSG00000??????/my_dir

Resources