Move all files from subdirectory into a new directory without overwriting - macos

I want to consolidate into 1 directory files that are in multiple subdirectories.
The following comes close except that the random string is added after the extension; I want it before the extension:
find . -type f -iname "[a-z,0-9]*" -exec bash -c 'mv -v "$0" "./$( mktemp "$( basename "$0" ).XXX" )"' '{}' \;
I've searched through dozens of other posts but nothing addressed the specifics of my situation:
I'm on OS X (so it's a BSD flavor of Bash; for ex. there's no -t option for mv)
Many of the files have identical names so I need to rewrite them during the mv (and I can't just use the -n option for mv because there too many files would thus not get moved)
The files are not all the same kind, so I need to use a find -type f
I want to exclude .DS_store files, so it seems like a good option is find -type f -iname "[a-z,0-9]*"
I want the rewritten files's names to be in the form of: oldname-random_string.xyz (but I'm also OK with having the files being renamed as a sequential list: 00001.xyz, 00002.xyz, etc.)
The files are buried 4 levels down from my master directory:
Master/Top dir
Dir 2
Dir 3
Dir 4
Dir 5
file
For the sake of simplicity I prefer a bash command to a .sh script (but I'm happy with either)

GNU Solution
This uses basically the same command that you were using but I supply a template to mktemp so that the XXX pattern appears just before the suffix. With GNU sed:
find . -type f -iname "[a-z,0-9]*" -exec bash -c 'mv -v "$1" "./$(mktemp -u "$(basename "$1" | sed -E -e '\''s/\.([^.]+)$/.XXX.\1/'\'' -e '\''/XXX/ !s/$/.XXX/'\'')" )"' _ '{}' \;
The key addition above is the use of sed to insert XXX before the suffix in the file name:
sed -E -e 's/\.([^.]+)$/.XXX.\1/' -e '/XXX/ !s/$/.XXX/'
This has two commands. The first puts .XXX before the extension. The second command is run only if the file name has no extension in which case it adds .XXX to the end of the file name.
In the first command, the source regex consists of two parts. The first is \. which matches a period. The second is ([^.]+)$ which captures the extension into group 1. The substitution replaces this with .XXX.\1 where \1 is sed notation for group 1 which, in our case, is the file's extension.
OSX Solution
Under OSX, mktemp is not useful because it only supports templates with the XXX part trailing. As a workaround, we can use a bash script that generates non-overlapping file names:
#!/bin/bash
find . -type f -iname "[a-z,0-9]*" -print0 |
while IFS= read -r -d '' fname
do
new=$(basename "$fname")
[ "$fname" = "./$new" ] && continue
[ "$new" = .DS_store ] && continue
name=${new%.*}
ext=${new#"$name"}
n=0
new=$(printf '%s.%03i%s' "$name" "$n" "$ext")
while [ -f "$new" ]
do
n=$(($n + 1))
new=$(printf '%s.%03i%s' "$name" "$n" "$ext")
done
mv -v "$fname" "$new"
done
The above uses the find command to get the file names. The option -print0 is used to assure that it works with difficult file names. The while loop reads these file names one by one, into the variable fname. fname includes the full path to the source file. The file name without the path is then stored in new. Then two checks are performed. If the source file is already in the current directory, the script continues on to the next loop. Similarly, if the file name id .DS_Store, it is also skipped. (The find command, as given, already skips these files. This line is there just for future flexibility.) Next, the file name is split into two parts: the name and ext, the extension. ext includes the leading period. Next, a loop checks for files of the form name.NNN.ext and stops at the first one that doesn't yet exist. The source file is moved to a file of that name.
Related Notes Regarding the GNU Solution and its Compatibility
Quoting in the above GNU command is complex. The argument to bash -c needs to be in single-quotes to prevent the calling bash from performing premature variable substitution. In addition, the sed commands need to be in single-quotes when executed by the bash subshell to prevent history expansion from interfering with the use of negation, !, within the sed command.
The OSX (BSD) sed does not support combining commands together with semicolons. Consequently, each command is supplied to sed via a separate -e option.
The OSX (BSD) sed seems to treat + differently from the GNU sed. This incompatibility seems to go away when using the -E (extended regex) option. (The corresponding GNU option is -r but, as an undocumented compatibility feature, GNU sed supports -E also.

Related

Bash: recursively rename part of a file [duplicate]

I want to go through a bunch of directories and rename all files that end in _test.rb to end in _spec.rb instead. It's something I've never quite figured out how to do with bash so this time I thought I'd put some effort in to get it nailed. I've so far come up short though, my best effort is:
find spec -name "*_test.rb" -exec echo mv {} `echo {} | sed s/test/spec/` \;
NB: there's an extra echo after exec so that the command is printed instead of run while I'm testing it.
When I run it the output for each matched filename is:
mv original original
i.e. the substitution by sed has been lost. What's the trick?
To solve it in a way most close to the original problem would be probably using xargs "args per command line" option:
find . -name "*_test.rb" | sed -e "p;s/test/spec/" | xargs -n2 mv
It finds the files in the current working directory recursively, echoes the original file name (p) and then a modified name (s/test/spec/) and feeds it all to mv in pairs (xargs -n2). Beware that in this case the path itself shouldn't contain a string test.
This happens because sed receives the string {} as input, as can be verified with:
find . -exec echo `echo "{}" | sed 's/./foo/g'` \;
which prints foofoo for each file in the directory, recursively. The reason for this behavior is that the pipeline is executed once, by the shell, when it expands the entire command.
There is no way of quoting the sed pipeline in such a way that find will execute it for every file, since find doesn't execute commands via the shell and has no notion of pipelines or backquotes. The GNU findutils manual explains how to perform a similar task by putting the pipeline in a separate shell script:
#!/bin/sh
echo "$1" | sed 's/_test.rb$/_spec.rb/'
(There may be some perverse way of using sh -c and a ton of quotes to do all this in one command, but I'm not going to try.)
you might want to consider other way like
for file in $(find . -name "*_test.rb")
do
echo mv $file `echo $file | sed s/_test.rb$/_spec.rb/`
done
I find this one shorter
find . -name '*_test.rb' -exec bash -c 'echo mv $0 ${0/test.rb/spec.rb}' {} \;
You can do it without sed, if you want:
for i in `find -name '*_test.rb'` ; do mv $i ${i%%_test.rb}_spec.rb ; done
${var%%suffix} strips suffix from the value of var.
or, to do it using sed:
for i in `find -name '*_test.rb'` ; do mv $i `echo $i | sed 's/test/spec/'` ; done
You mention that you are using bash as your shell, in which case you don't actually need find and sed to achieve the batch renaming you're after...
Assuming you are using bash as your shell:
$ echo $SHELL
/bin/bash
$ _
... and assuming you have enabled the so-called globstar shell option:
$ shopt -p globstar
shopt -s globstar
$ _
... and finally assuming you have installed the rename utility (found in the util-linux-ng package)
$ which rename
/usr/bin/rename
$ _
... then you can achieve the batch renaming in a bash one-liner as follows:
$ rename _test _spec **/*_test.rb
(the globstar shell option will ensure that bash finds all matching *_test.rb files, no matter how deeply they are nested in the directory hierarchy... use help shopt to find out how to set the option)
The easiest way:
find . -name "*_test.rb" | xargs rename s/_test/_spec/
The fastest way (assuming you have 4 processors):
find . -name "*_test.rb" | xargs -P 4 rename s/_test/_spec/
If you have a large number of files to process, it is possible that the list of filenames piped to xargs would cause the resulting command line to exceed the maximum length allowed.
You can check your system's limit using getconf ARG_MAX
On most linux systems you can use free -b or cat /proc/meminfo to find how much RAM you have to work with; Otherwise, use top or your systems activity monitor app.
A safer way (assuming you have 1000000 bytes of ram to work with):
find . -name "*_test.rb" | xargs -s 1000000 rename s/_test/_spec/
Here is what worked for me when the file names had spaces in them. The example below recursively renames all .dar files to .zip files:
find . -name "*.dar" -exec bash -c 'mv "$0" "`echo \"$0\" | sed s/.dar/.zip/`"' {} \;
For this you don't need sed. You can perfectly get alone with a while loop fed with the result of find through a process substitution.
So if you have a find expression that selects the needed files, then use the syntax:
while IFS= read -r file; do
echo "mv $file ${file%_test.rb}_spec.rb" # remove "echo" when OK!
done < <(find -name "*_test.rb")
This will find files and rename all of them striping the string _test.rb from the end and appending _spec.rb.
For this step we use Shell Parameter Expansion where ${var%string} removes the shortest matching pattern "string" from $var.
$ file="HELLOa_test.rbBYE_test.rb"
$ echo "${file%_test.rb}" # remove _test.rb from the end
HELLOa_test.rbBYE
$ echo "${file%_test.rb}_spec.rb" # remove _test.rb and append _spec.rb
HELLOa_test.rbBYE_spec.rb
See an example:
$ tree
.
├── ab_testArb
├── a_test.rb
├── a_test.rb_test.rb
├── b_test.rb
├── c_test.hello
├── c_test.rb
└── mydir
└── d_test.rb
$ while IFS= read -r file; do echo "mv $file ${file/_test.rb/_spec.rb}"; done < <(find -name "*_test.rb")
mv ./b_test.rb ./b_spec.rb
mv ./mydir/d_test.rb ./mydir/d_spec.rb
mv ./a_test.rb ./a_spec.rb
mv ./c_test.rb ./c_spec.rb
if you have Ruby (1.9+)
ruby -e 'Dir["**/*._test.rb"].each{|x|test(?f,x) and File.rename(x,x.gsub(/_test/,"_spec") ) }'
In ramtam's answer which I like, the find portion works OK but the remainder does not if the path has spaces. I am not too familiar with sed, but I was able to modify that answer to:
find . -name "*_test.rb" | perl -pe 's/^((.*_)test.rb)$/"\1" "\2spec.rb"/' | xargs -n2 mv
I really needed a change like this because in my use case the final command looks more like
find . -name "olddir" | perl -pe 's/^((.*)olddir)$/"\1" "\2new directory"/' | xargs -n2 mv
I haven't the heart to do it all over again, but I wrote this in answer to Commandline Find Sed Exec. There the asker wanted to know how to move an entire tree, possibly excluding a directory or two, and rename all files and directories containing the string "OLD" to instead contain "NEW".
Besides describing the how with painstaking verbosity below, this method may also be unique in that it incorporates built-in debugging. It basically doesn't do anything at all as written except compile and save to a variable all commands it believes it should do in order to perform the work requested.
It also explicitly avoids loops as much as possible. Besides the sed recursive search for more than one match of the pattern there is no other recursion as far as I know.
And last, this is entirely null delimited - it doesn't trip on any character in any filename except the null. I don't think you should have that.
By the way, this is REALLY fast. Look:
% _mvnfind() { mv -n "${1}" "${2}" && cd "${2}"
> read -r SED <<SED
> :;s|${3}\(.*/[^/]*${5}\)|${4}\1|;t;:;s|\(${5}.*\)${3}|\1${4}|;t;s|^[0-9]*[\t]\(mv.*\)${5}|\1|p
> SED
> find . -name "*${3}*" -printf "%d\tmv %P ${5} %P\000" |
> sort -zg | sed -nz ${SED} | read -r ${6}
> echo <<EOF
> Prepared commands saved in variable: ${6}
> To view do: printf ${6} | tr "\000" "\n"
> To run do: sh <<EORUN
> $(printf ${6} | tr "\000" "\n")
> EORUN
> EOF
> }
% rm -rf "${UNNECESSARY:=/any/dirs/you/dont/want/moved}"
% time ( _mvnfind ${SRC=./test_tree} ${TGT=./mv_tree} \
> ${OLD=google} ${NEW=replacement_word} ${sed_sep=SsEeDd} \
> ${sh_io:=sh_io} ; printf %b\\000 "${sh_io}" | tr "\000" "\n" \
> | wc - ; echo ${sh_io} | tr "\000" "\n" | tail -n 2 )
<actual process time used:>
0.06s user 0.03s system 106% cpu 0.090 total
<output from wc:>
Lines Words Bytes
115 362 20691 -
<output from tail:>
mv .config/replacement_word-chrome-beta/Default/.../googlestars \
.config/replacement_word-chrome-beta/Default/.../replacement_wordstars
NOTE: The above function will likely require GNU versions of sed and find to properly handle the find printf and sed -z -e and :;recursive regex test;t calls. If these are not available to you the functionality can likely be duplicated with a few minor adjustments.
This should do everything you wanted from start to finish with very little fuss. I did fork with sed, but I was also practicing some sed recursive branching techniques so that's why I'm here. It's kind of like getting a discount haircut at a barber school, I guess. Here's the workflow:
rm -rf ${UNNECESSARY}
I intentionally left out any functional call that might delete or destroy data of any kind. You mention that ./app might be unwanted. Delete it or move it elsewhere beforehand, or, alternatively, you could build in a \( -path PATTERN -exec rm -rf \{\} \) routine to find to do it programmatically, but that one's all yours.
_mvnfind "${#}"
Declare its arguments and call the worker function. ${sh_io} is especially important in that it saves the return from the function. ${sed_sep} comes in a close second; this is an arbitrary string used to reference sed's recursion in the function. If ${sed_sep} is set to a value that could potentially be found in any of your path- or file-names acted upon... well, just don't let it be.
mv -n $1 $2
The whole tree is moved from the beginning. It will save a lot of headache; believe me. The rest of what you want to do - the renaming - is simply a matter of filesystem metadata. If you were, for instance, moving this from one drive to another, or across filesystem boundaries of any kind, you're better off doing so at once with one command. It's also safer. Note the -noclobber option set for mv; as written, this function will not put ${SRC_DIR} where a ${TGT_DIR} already exists.
read -R SED <<HEREDOC
I located all of sed's commands here to save on escaping hassles and read them into a variable to feed to sed below. Explanation below.
find . -name ${OLD} -printf
We begin the find process. With find we search only for anything that needs renaming because we already did all of the place-to-place mv operations with the function's first command. Rather than take any direct action with find, like an exec call, for instance, we instead use it to build out the command-line dynamically with -printf.
%dir-depth :tab: 'mv '%path-to-${SRC}' '${sed_sep}'%path-again :null delimiter:'
After find locates the files we need it directly builds and prints out (most) of the command we'll need to process your renaming. The %dir-depth tacked onto the beginning of each line will help to ensure we're not trying to rename a file or directory in the tree with a parent object that has yet to be renamed. find uses all sorts of optimization techniques to walk your filesystem tree and it is not a sure thing that it will return the data we need in a safe-for-operations order. This is why we next...
sort -general-numerical -zero-delimited
We sort all of find's output based on %directory-depth so that the paths nearest in relationship to ${SRC} are worked first. This avoids possible errors involving mving files into non-existent locations, and it minimizes need to for recursive looping. (in fact, you might be hard-pressed to find a loop at all)
sed -ex :rcrs;srch|(save${sep}*til)${OLD}|\saved${SUBSTNEW}|;til ${OLD=0}
I think this is the only loop in the whole script, and it only loops over the second %Path printed for each string in case it contains more than one ${OLD} value that might need replacing. All other solutions I imagined involved a second sed process, and while a short loop may not be desirable, certainly it beats spawning and forking an entire process.
So basically what sed does here is search for ${sed_sep}, then, having found it, saves it and all characters it encounters until it finds ${OLD}, which it then replaces with ${NEW}. It then heads back to ${sed_sep} and looks again for ${OLD}, in case it occurs more than once in the string. If it is not found, it prints the modified string to stdout (which it then catches again next) and ends the loop.
This avoids having to parse the entire string, and ensures that the first half of the mv command string, which needs to include ${OLD} of course, does include it, and the second half is altered as many times as is necessary to wipe the ${OLD} name from mv's destination path.
sed -ex...-ex search|%dir_depth(save*)${sed_sep}|(only_saved)|out
The two -exec calls here happen without a second fork. In the first, as we've seen, we modify the mv command as supplied by find's -printf function command as necessary to properly alter all references of ${OLD} to ${NEW}, but in order to do so we had to use some arbitrary reference points which should not be included in the final output. So once sed finishes all it needs to do, we instruct it to wipe out its reference points from the hold-buffer before passing it along.
AND NOW WE'RE BACK AROUND
read will receive a command that looks like this:
% mv /path2/$SRC/$OLD_DIR/$OLD_FILE /same/path_w/$NEW_DIR/$NEW_FILE \000
It will read it into ${msg} as ${sh_io} which can be examined at will outside of the function.
Cool.
-Mike
I was able handle filenames with spaces by following the examples suggested by onitake.
This doesn't break if the path contains spaces or the string test:
find . -name "*_test.rb" -print0 | while read -d $'\0' file
do
echo mv "$file" "$(echo $file | sed s/test/spec/)"
done
This is an example that should work in all cases.
Works recursiveley, Need just shell, and support files names with spaces.
find spec -name "*_test.rb" -print0 | while read -d $'\0' file; do mv "$file" "`echo $file | sed s/test/spec/`"; done
$ find spec -name "*_test.rb"
spec/dir2/a_test.rb
spec/dir1/a_test.rb
$ find spec -name "*_test.rb" | xargs -n 1 /usr/bin/perl -e '($new=$ARGV[0]) =~ s/test/spec/; system(qq(mv),qq(-v), $ARGV[0], $new);'
`spec/dir2/a_test.rb' -> `spec/dir2/a_spec.rb'
`spec/dir1/a_test.rb' -> `spec/dir1/a_spec.rb'
$ find spec -name "*_spec.rb"
spec/dir2/b_spec.rb
spec/dir2/a_spec.rb
spec/dir1/a_spec.rb
spec/dir1/c_spec.rb
Your question seems to be about sed, but to accomplish your goal of recursive rename, I'd suggest the following, shamelessly ripped from another answer I gave here:recursive rename in bash
#!/bin/bash
IFS=$'\n'
function RecurseDirs
{
for f in "$#"
do
newf=echo "${f}" | sed -e 's/^(.*_)test.rb$/\1spec.rb/g'
echo "${f}" "${newf}"
mv "${f}" "${newf}"
f="${newf}"
if [[ -d "${f}" ]]; then
cd "${f}"
RecurseDirs $(ls -1 ".")
fi
done
cd ..
}
RecurseDirs .
More secure way of doing rename with find utils and sed regular expression type:
mkdir ~/practice
cd ~/practice
touch classic.txt.txt
touch folk.txt.txt
Remove the ".txt.txt" extension as follows -
cd ~/practice
find . -name "*txt" -execdir sh -c 'mv "$0" `echo "$0" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'`' {} \;
If you use the + in place of ; in order to work on batch mode, the above command will rename only the first matching file, but not the entire list of file matches by 'find'.
find . -name "*txt" -execdir sh -c 'mv "$0" `echo "$0" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'`' {} +
Here's a nice oneliner that does the trick.
Sed can't handle this right, especially if multiple variables are passed by xargs with -n 2.
A bash substition would handle this easily like:
find ./spec -type f -name "*_test.rb" -print0 | xargs -0 -I {} sh -c 'export file={}; mv $file ${file/_test.rb/_spec.rb}'
Adding -type -f will limit the move operations to files only, -print 0 will handle empty spaces in paths.
I share this post as it is a bit related to question. Sorry for not providing more details. Hope it helps someone else.
http://www.peteryu.ca/tutorials/shellscripting/batch_rename
This is my working solution:
for FILE in {{FILE_PATTERN}}; do echo ${FILE} | mv ${FILE} $(sed 's/{{SOURCE_PATTERN}}/{{TARGET_PATTERN}}/g'); done

Trying to rename certain file types within recursive directories

I have a bunch of files within a directory structure as such:
Dir
SubDir
File
File
Subdir
SubDir
File
File
File
Sorry for the messy formatting, but as you can see there are files at all different directory levels. All of these file names have a string of 7 numbers appended to them as such: 1234567_filename.ext. I am trying to remove the number and underscore at the start of the filename.
Right now I am using bash and using this oneliner to rename the files using mv and cut:
for i in *; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
This is being run while I am CD'd into the directory. I would love to find a way to do this recursively, so that it only renamed files, not folders. I have also used a foreach loop in the shell, outside of bash for directories that have a bunch of folders with files in them and no other subdirectories as such:
foreach$ set p=`echo $f | cut -d/ -f1`
foreach$ set n=`echo $f | cut -d/ -f2 | cut -d_ -f2-10`
foreach$ mv $f $p/$n
foreach$ end
But that only works when there are no other subdirectories within the folders.
Is there a loop or oneliner I can use to rename all files within the directories? I even tried using find but couldn't figure out how to incorporate cut into the code.
Any help is much appreciated.
With Perl‘s rename (standalone command):
shopt -s globstar
rename -n 's|/[0-9]{7}_([^/]+$)|/$1|' **/*
If everything looks fine remove -n.
globstar: If set, the pattern ** used in a pathname expansion context will
match all files and zero or more directories and subdirectories. If
the pattern is followed by a /, only directories and subdirectories
match.
bash does provide functions, and these can be recursive, but you don't need a recursive function for this job. You just need to enumerate all the files in the tree. The find command can do that, but turning on bash's globstar option and using a shell glob to do it is safer:
#!/bin/bash
shopt -s globstar
# enumerate all the files in the tree rooted at the current working directory
for f in **; do
# ignore directories
test -d "$f" && continue
# separate the base file name from the path
name=$(basename "$f")
dir=$(dirname "$f")
# perform the rename, using a pattern substitution on the name part
mv "$f" "${dir}/${name/#???????_/}"
done
Note that that does not verify that file names actually match the pattern you specified before performing the rename; I'm taking you at your word that they do. If such a check were wanted then it could certainly be added.
How about this small tweak to what you have already:
for i in `find . -type f`; do mv "$i" "$(echo $i | cut -d_ -f2-10)"; done
Basically just swapping the * with `find . -type f`
Should be possible to do this using find...
find -E . -type f \
-regex '.*/[0-9]{7}_.*\.txt' \
-exec sh -c 'f="${0#*/}"; mv -v "$0" "${0%/*}/${f#*_}"' {} \;
Your find options may be different -- I'm doing this in FreeBSD. The idea here is:
-E instructs find to use extended regular expressions.
-type f causes only normal files (not directories or symlinks) to be found.
-regex ... matches the files you're looking for. You can make this more specific if you need to.
exec ... \; runs a command, using {} (the file we've found) as an argument.
The command we're running uses parameter expansion first to grab the target directory and second to strip the filename. Note the temporary variable $f, which is used to address the possibility of extra underscores being part of the filename.
Note that this is NOT a bash command, though you can of course run it from the bash shell. If you want a bash solution that does not require use of an external tool like find, you may be able to do the following:
$ shopt -s extglob # use extended glob format
$ shopt -s globstar # recurse using "**"
$ for f in **/+([0-9])_*.txt; do f="./$f"; echo mv "$f" "${f%/*}/${f##*_}"; done
This uses the same logic as the find solution, but uses bash v4 extglob to provide better filename matching and globstar to recurse through subdirectories.
Hope these help.

Shell script: Check if a Directory is of YYYY_MM_DD_HH this format

I have a script that creates a file list of directories available in another path.
Now, I would like to do some tasks only if the Directory is of the format "YYYY_MM_DD_HH" in this file list.
My file list has following entries:
2014_04_21_01
asdf
2012_01_19_10
2010_01
Now I would like to move the directories with names as YYYY_MM_DD_HH to another path. I.e., only 2014_04_21_01 & 2012_01_19_10 MUST be MOVED.
Please advise.
Use bash regex pattern matching:
for dir in $list
do if [[ "$dir" =~ ^[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]{2}$ ]]
then mv "$dir" newdir/
fi
done
Assuming you have a GNU version of sed on your computer, you could use it to easily parse your directory names and execute a command.
Say we have following input file:
2014_04_21_01
asdf
2012_01_19_10
2010_01
2012_01_19_10_09
62012_01_19_10
You can search for your regex with sed and replace it with a mv command as follows:
sed 's/^[0-9]\{4\}\(_[0-9]\{2\}\)\{3\}$/mv "&" "other_dir"/' file_list
will output:
mv "2014_04_21_01" "other_dir" # We want to run this
asdf
mv "2012_01_19_10" "other_dir" # and this
2010_01
2012_01_19_10_09
62012_01_19_10
Now if you add the (GNU sed) e option at the end of sed substitution (and -n option before sed script to ensure only successul substitutions are executed), the generated command will be piped into your shell:
sed -n 's/^[0-9]\{4\}\(_[0-9]\{2\}\)\{3\}$/mv "&" "other_dir"/e' file_list
# ^^ ^
I would recommand to run it first without the e option so as to check that mv commands will be properly formatted.
Why to make separate file for file list. Just go in that directory execute following command. I have taken the destination directory as /home/newdir/
ls | grep [0-9][0-9][0-9][0-9]_[01][0-9]_[0123][0-9]_[012][0-9] | awk '{print $0" /home/newdir/"}' | xargs mv
Be Careful while working with dates. As you have mentioned that file name is in format YYYY_MM_DD_HH then we have restrictions on MM,DD and HH. If we talk about restrictions then we know how a calendar is constructed. So 9999_99_99_99 is invalid file name. It is not satisfying YYYY_MM_DD_HH.
We have to build script for restrictions or I can say whole calendar. Still working on it.
Example:
perl -nle 'system("mv $_ dir/year$1") if /^(\d{4})_\d\d_\d\d_\d\d/$' flist
would extract the year and rename dir 2014_04_21_01 to dir/year2014
This single find command with -regex option should take care of this:
cd /base/path/of/these/dirs
find . -type d -regextype posix-egrep -regex '.*/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]{2}$' \
-exec mv '{}' /dest/dir/ \;

Using bash I need to perform a find of 0 byte files but report on their existence before deletion

The history of this problem is:
I have millions of files and directories on a NAS system. I found a count of 1,095,601 empty (0 byte) files. These files used to have data but were destroyed by a predecessor not using the correct toolsets to migrate data between an XSAN and this Isilon NAS.
The files were media production data, like fonts, pdfs and image files. They are no longer useful beyond the history of their existence. Before I proceed to delete them, the production user's need a record of which files used to exist, so when they browse a project folder, they can use the unaffected files but then refer to a text file in the same directory which records which files used to also be there and thus provide reason as to why certain reference files are broken.
So how do I find files across multiple directories and delete them but first output their filename to a text file which would be saved to each relevant path location?
I am thinking along the lines of:
for file in $(find . -type f -size 0); do
echo "$file" >> /PATH/TO/FOUND/FILE/PARENT/DIR/deletedFiles.txt -print0 |
xargs -0 rm ;
done
To delete each empty file while leaving behind a file called deletedFiles.txt which contains the names of the deleted files, try:
PATH=/bin:/usr/bin find . -empty -type f -execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} + -delete
How it works
PATH=/bin:/usr/bin
This sets a temporary but secure path.
find .
This starts find looking in the current directory
-empty
This tells find to only look for empty files
-type f
This restricts find to looking for regular files.
-execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
In each directory that contains an empty file, this adds the name of each empty file to the file deletedFiles.txt.
Notice the peculiar use of none in the command:
bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
When this command is run, bash will execute the string printf "%s\n" "$#" >>deletedFiles.txt and the arguments that follow that string are assigned to the positional parameters: $0, $1, $2, etc. When we use $#, it does not include $0. It, as is usual, expands to $1, $2, .... Thus, we add the placeholder none so that the placeholder is assigned is the $0, which we will ignore, and the complete list of file names are assigned to "$#".
-delete
This deletes each empty file.
Why not simply
find . -type f -size 0 -exec rm -v + |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
If your find is too old to support -exec ... + you'll need to revert to -exec rm -v {} \; or refactor to
find . -type f -size 0 -print0 |
xargs -r -0 rm -v |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
The brief sed script is to postprocess the output from rm -v which looks like
removed ‘./bar’
removed ‘./foo’
(with some funny quote characters around the file name) on my system. If you are fine with that output, of course, just omit the sed script from the pipeline.
If you know in advance which directories contain empty files, you can run the above snippet individually in those directories. Assuming you saved the snippet above as a script (with a proper shebang and execute permissions) named find-empty, you could simply use
for path in /path/to/first /path/to/second/directory /path/to/etc; do
cd "$path" && find-empty
done
This will only work if you have absolute paths (if not, you can run the body of the loop in a subshell by adding parentheses around it).
If you want to inspect all the directories in a tree, change the script to print to standard output instead (remove >deletedFiles.txt from the script) and try something like
find /path/to/tree -type d -exec sh -c '
t=$(mktemp -t find-emptyXXXXXXXX)
cd "$1" &&
find-empty | grep . >"$t" &&
mv "$t" deletedFiles.txt ||
rm "$t"' _ {} \;
This uses a temporary file so as to avoid updating the timestamp of directories which do not contain any empty files. The grep . is used purely for side effect; if any (non-empty) lines are printed, it will return success, whereas otherwise, it will report failure; this way, we know whether or not to move the temporary file to the target directory.
With prompting from #JonathanLeffler I have succeeded with the following:
#!/bin/bash
## call this script with: find . -type f -empty -exec handleEmpty.sh {} +
for file in "$#"
do
file2="$(basename "$file")"
echo "$file2" >> "$(dirname "$file")"/deletedFiles.txt
rm "$file"
done
This means I retain a trace of the removed files in a deletedFiles.txt flag file in each respective directory for the users to see when files are missing. That way, they can pursue going back to archive CD's to retrieve these deleted files, which are hopefully not 0 byte files.
Thanks to #John1024 for the suggestion of using the empty flag rather than size.

How can I convert tabs to spaces in every file of a directory?

How can I convert tabs to spaces in every file of a directory (possibly recursively)?
Also, is there a way of setting the number of spaces per tab?
Simple replacement with sed is okay but not the best possible solution. If there are "extra" spaces between the tabs they will still be there after substitution, so the margins will be ragged. Tabs expanded in the middle of lines will also not work correctly. In bash, we can say instead
find . -name '*.java' ! -type d -exec bash -c 'expand -t 4 "$0" > /tmp/e && mv /tmp/e "$0"' {} \;
to apply expand to every Java file in the current directory tree. Remove / replace the -name argument if you're targeting some other file types. As one of the comments mentions, be very careful when removing -name or using a weak, wildcard. You can easily clobber repository and other hidden files without intent. This is why the original answer included this:
You should always make a backup copy of the tree before trying something like this in case something goes wrong.
Try the command line tool expand.
expand -i -t 4 input | sponge output
where
-i is used to expand only leading tabs on each line;
-t 4 means that each tab will be converted to 4 whitespace chars (8 by default).
sponge is from the moreutils package, and avoids clearing the input file. On macOS, the package moreutils is available via Homebrew (brew install moreutils) or MacPorts (sudo port install moreutils).
Finally, you can use gexpand on macOS, after installing coreutils with Homebrew (brew install coreutils) or MacPorts (sudo port install coreutils).
Warning: This will break your repo.
This will corrupt binary files, including those under svn, .git! Read the comments before using!
find . -iname '*.java' -type f -exec sed -i.orig 's/\t/ /g' {} +
The original file is saved as [filename].orig.
Replace '*.java' with the file ending of the file type you are looking for. This way you can prevent accidental corruption of binary files.
Downsides:
Will replace tabs everywhere in a file.
Will take a long time if you happen to have a 5GB SQL dump in this directory.
Collecting the best comments from Gene's answer, the best solution by far, is by using sponge from moreutils.
sudo apt-get install moreutils
# The complete one-liner:
find ./ -iname '*.java' -type f -exec bash -c 'expand -t 4 "$0" | sponge "$0"' {} \;
Explanation:
./ is recursively searching from current directory
-iname is a case insensitive match (for both *.java and *.JAVA likes)
type -f finds only regular files (no directories, binaries or symlinks)
-exec bash -c execute following commands in a subshell for each file name, {}
expand -t 4 expands all TABs to 4 spaces
sponge soak up standard input (from expand) and write to a file (the same one)*.
NOTE: * A simple file redirection (> "$0") won't work here because it would overwrite the file too soon.
Advantage: All original file permissions are retained and no intermediate tmp files are used.
Use backslash-escaped sed.
On linux:
Replace all tabs with 1 hyphen inplace, in all *.txt files:
sed -i $'s/\t/-/g' *.txt
Replace all tabs with 1 space inplace, in all *.txt files:
sed -i $'s/\t/ /g' *.txt
Replace all tabs with 4 spaces inplace, in all *.txt files:
sed -i $'s/\t/ /g' *.txt
On a mac:
Replace all tabs with 4 spaces inplace, in all *.txt files:
sed -i '' $'s/\t/ /g' *.txt
You can use the generally available pr command (man page here). For example, to convert tabs to four spaces, do this:
pr -t -e=4 file > file.expanded
-t suppresses headers
-e=num expands tabs to num spaces
To convert all files in a directory tree recursively, while skipping binary files:
#!/bin/bash
num=4
shopt -s globstar nullglob
for f in **/*; do
[[ -f "$f" ]] || continue # skip if not a regular file
! grep -qI "$f" && continue # skip binary files
pr -t -e=$num "$f" > "$f.expanded.$$" && mv "$f.expanded.$$" "$f"
done
The logic for skipping binary files is from this post.
NOTE:
Doing this could be dangerous in a git or svn repo
This is not the right solution if you have code files that have bare tabs embedded in string literals
My recommendation is to use:
find . -name '*.lua' -exec ex '+%s/\t/ /g' -cwq {} \;
Comments:
Use in place editing. Keep backups in a VCS. No need to produce *.orig files. It's good practice to diff the result against your last commit to make sure this worked as expected, in any case.
sed is a stream editor. Use ex for in place editing. This avoids creating extra temp files and spawning shells for each replacement as in the top answer.
WARNING: This messes with all tabs, not only those used for indentation. Also it does not do context aware replacement of tabs. This was sufficient for my use case. But might not be acceptable for you.
EDIT: An earlier version of this answer used find|xargs instead of find -exec. As pointed out by #gniourf-gniourf this leads to problems with spaces, quotes and control chars in file names cf. Wheeler.
You can use find with tabs-to-spaces package for this.
First, install tabs-to-spaces
npm install -g tabs-to-spaces
then, run this command from the root directory of your project;
find . -name '*' -exec t2s --spaces 2 {} \;
This will replace every tab character with 2 spaces in every file.
How can I convert tabs to spaces in every file of a directory (possibly
recursively)?
This is usually not what you want.
Do you want to do this for png images? PDF files? The .git directory? Your
Makefile (which requires tabs)? A 5GB SQL dump?
You could, in theory, pass a whole lot of exlude options to find or whatever
else you're using; but this is fragile, and will break as soon as you add other
binary files.
What you want, is at least:
Skip files over a certain size.
Detect if a file is binary by checking for the presence of a NULL byte.
Only replace tabs at the start of a file (expand does this, sed
doesn't).
As far as I know, there is no "standard" Unix utility that can do this, and it's not very easy to do with a shell one-liner, so a script is needed.
A while ago I created a little script called
sanitize_files which does exactly
that. It also fixes some other common stuff like replacing \r\n with \n,
adding a trailing \n, etc.
You can find a simplified script without the extra features and command-line arguments below, but I
recommend you use the above script as it's more likely to receive bugfixes and
other updated than this post.
I would also like to point out, in response to some of the other answers here,
that using shell globbing is not a robust way of doing this, because sooner
or later you'll end up with more files than will fit in ARG_MAX (on modern
Linux systems it's 128k, which may seem a lot, but sooner or later it's not
enough).
#!/usr/bin/env python
#
# http://code.arp242.net/sanitize_files
#
import os, re, sys
def is_binary(data):
return data.find(b'\000') >= 0
def should_ignore(path):
keep = [
# VCS systems
'.git/', '.hg/' '.svn/' 'CVS/',
# These files have significant whitespace/tabs, and cannot be edited
# safely
# TODO: there are probably more of these files..
'Makefile', 'BSDmakefile', 'GNUmakefile', 'Gemfile.lock'
]
for k in keep:
if '/%s' % k in path:
return True
return False
def run(files):
indent_find = b'\t'
indent_replace = b' ' * indent_width
for f in files:
if should_ignore(f):
print('Ignoring %s' % f)
continue
try:
size = os.stat(f).st_size
# Unresolvable symlink, just ignore those
except FileNotFoundError as exc:
print('%s is unresolvable, skipping (%s)' % (f, exc))
continue
if size == 0: continue
if size > 1024 ** 2:
print("Skipping `%s' because it's over 1MiB" % f)
continue
try:
data = open(f, 'rb').read()
except (OSError, PermissionError) as exc:
print("Error: Unable to read `%s': %s" % (f, exc))
continue
if is_binary(data):
print("Skipping `%s' because it looks binary" % f)
continue
data = data.split(b'\n')
fixed_indent = False
for i, line in enumerate(data):
# Fix indentation
repl_count = 0
while line.startswith(indent_find):
fixed_indent = True
repl_count += 1
line = line.replace(indent_find, b'', 1)
if repl_count > 0:
line = indent_replace * repl_count + line
data = list(filter(lambda x: x is not None, data))
try:
open(f, 'wb').write(b'\n'.join(data))
except (OSError, PermissionError) as exc:
print("Error: Unable to write to `%s': %s" % (f, exc))
if __name__ == '__main__':
allfiles = []
for root, dirs, files in os.walk(os.getcwd()):
for f in files:
p = '%s/%s' % (root, f)
if do_add:
allfiles.append(p)
run(allfiles)
I like the "find" example above for the recursive application. To adapt it to be non-recursive, only changing files in the current directory that match a wildcard, the shell glob expansion can be sufficient for small amounts of files:
ls *.java | awk '{print "expand -t 4 ", $0, " > /tmp/e; mv /tmp/e ", $0}' | sh -v
If you want it silent after you trust that it works, just drop the -v on the sh command at the end.
Of course you can pick any set of files in the first command. For example, list only a particular subdirectory (or directories) in a controlled manner like this:
ls mod/*/*.php | awk '{print "expand -t 4 ", $0, " > /tmp/e; mv /tmp/e ", $0}' | sh
Or in turn run find(1) with some combination of depth parameters etc:
find mod/ -name '*.php' -mindepth 1 -maxdepth 2 | awk '{print "expand -t 4 ", $0, " > /tmp/e; mv /tmp/e ", $0}' | sh
I used astyle to re-indent all my C/C++ code after finding mixed tabs and spaces. It also has options to force a particular brace style if you'd like.
One can use vim for that:
find -type f \( -name '*.css' -o -name '*.html' -o -name '*.js' -o -name '*.php' \) -execdir vim -c retab -c wq {} \;
As Carpetsmoker stated, it will retab according to your vim settings. And modelines in the files, if any. Also, it will replace tabs not only at the beginning of the lines. Which is not what you generally want. E.g., you might have literals, containing tabs.
To convert all Java files recursively in a directory to use 4 spaces instead of a tab:
find . -type f -name *.java -exec bash -c 'expand -t 4 {} > /tmp/stuff;mv /tmp/stuff {}' \;
No body mentioned rpl? Using rpl you can replace any string.
To convert tabs to spaces,
rpl -R -e "\t" " " .
very simple.
Download and run the following script to recursively convert hard tabs to soft tabs in plain text files.
Execute the script from inside the folder which contains the plain text files.
#!/bin/bash
find . -type f -and -not -path './.git/*' -exec grep -Iq . {} \; -and -print | while read -r file; do {
echo "Converting... "$file"";
data=$(expand --initial -t 4 "$file");
rm "$file";
echo "$data" > "$file";
}; done;
Git repository friendly method
git-tab-to-space() (
d="$(mktemp -d)"
git grep --cached -Il '' | grep -E "${1:-.}" | \
xargs -I'{}' bash -c '\
f="${1}/f" \
&& expand -t 4 "$0" > "$f" && \
chmod --reference="$0" "$f" && \
mv "$f" "$0"' \
'{}' "$d" \
;
rmdir "$d"
)
Act on all files under the current directory:
git-tab-to-space
Act only on C or C++ files:
git-tab-to-space '\.(c|h)(|pp)$'
You likely want this notably because of those annoying Makefiles which require tabs.
The command git grep --cached -Il '':
lists only the tracked files, so nothing inside .git
excludes directories, binary files (would be corrupted), and symlinks (would be converted to regular files)
as explained at: How to list all text (non-binary) files in a git repository?
chmod --reference keeps the file permissions unchanged: https://unix.stackexchange.com/questions/20645/clone-ownership-and-permissions-from-another-file Unfortunately I can't find a succinct POSIX alternative.
If your codebase had the crazy idea to allow functional raw tabs in strings, use:
expand -i
and then have fun going over all non start of line tabs one by one, which you can list with: Is it possible to git grep for tabs?
Tested on Ubuntu 18.04.
The use of expand as suggested in other answers seems the most logical approach for this task alone.
That said, it can also be done with Bash and Awk in case you may want to do some other modifications along with it.
If using Bash 4.0 or greater, the shopt builtin globstar can be used to search recursively with **.
With GNU Awk version 4.1 or greater, sed like "inplace" file modifications can be made:
shopt -s globstar
gawk -i inplace '{gsub("\t"," ")}1' **/*.ext
In case you want to set the number of spaces per tab:
gawk -i inplace -v n=4 'BEGIN{for(i=1;i<=n;i++) c=c" "}{gsub("\t",c)}1' **/*.ext
Converting tabs to space in just in ".lua" files [tabs -> 2 spaces]
find . -iname "*.lua" -exec sed -i "s#\t# #g" '{}' \;
Use the vim-way:
$ ex +'bufdo retab' -cxa **/*.*
Make the backup! before executing the above command, as it can corrupt your binary files.
To use globstar (**) for recursion, activate by shopt -s globstar.
To specify specific file type, use for example: **/*.c.
To modify tabstop, add +'set ts=2'.
However the down-side is that it can replace tabs inside the strings.
So for slightly better solution (by using substitution), try:
$ ex -s +'bufdo %s/^\t\+/ /ge' -cxa **/*.*
Or by using ex editor + expand utility:
$ ex -s +'bufdo!%!expand -t2' -cxa **/*.*
For trailing spaces, see: How to remove trailing whitespaces for multiple files?
You may add the following function into your .bash_profile:
# Convert tabs to spaces.
# Usage: retab *.*
# See: https://stackoverflow.com/q/11094383/55075
retab() {
ex +'set ts=2' +'bufdo retab' -cxa $*
}

Resources