Best way to do a find/replace in several files? - shell

what's the best way to do this? I'm no command line warrior, but I was thinking there's possibly a way of using grep and cat.
I just want to replace a string that occurs in a folder and sub-folders. what's the best way to do this? I'm running ubuntu if that matters.

I'll throw in another example for folks using ag, The Silver Searcher to do find/replace operations on multiple files.
Complete example:
ag -l "search string" | xargs sed -i '' -e 's/from/to/g'
If we break this down, what we get is:
# returns a list of files containing matching string
ag -l "search string"
Next, we have:
# consume the list of piped files and prepare to run foregoing command
# for each file delimited by newline
xargs
Finally, the string replacement command:
# -i '' means edit files in place and the '' means do not create a backup
# -e 's/from/to/g' specifies the command to run, in this case,
# global, search and replace
sed -i '' -e 's/from/to/g'

find . -type f -print0 | xargs -0 -n 1 sed -i -e 's/from/to/g'
The first part of that is a find command to find the files you want to change. You may need to modify that appropriately. The xargs command takes every file the find found and applies the sed command to it. The sed command takes every instance of from and replaces it with to. That's a standard regular expression, so modify it as you need.
If you are using svn beware. Your .svn-directories will be search and replaced as well. You have to exclude those, e.g., like this:
find . ! -regex ".*[/]\.svn[/]?.*" -type f -print0 | xargs -0 -n 1 sed -i -e 's/from/to/g'
or
find . -name .svn -prune -o -type f -print0 | xargs -0 -n 1 sed -i -e 's/from/to/g'

As Paul said, you want to first find the files you want to edit and then edit them. An alternative to using find is to use GNU grep (the default on Ubuntu), e.g.:
grep -r -l from . | xargs -0 -n 1 sed -i -e 's/from/to/g'
You can also use ack-grep (sudo apt-get install ack-grep or visit http://petdance.com/ack/) as well, if you know you only want a certain type of file, and want to ignore things in version control directories. e.g., if you only want text files,
ack -l --print0 --text from | xargs -0 -n 1 sed -i -e 's/from/to/g'
# `from` here is an arbitrary commonly occurring keyword
An alternative to using sed is to use perl which can process multiple files per command, e.g.,
grep -r -l from . | xargs perl -pi.bak -e 's/from/to/g'
Here, perl is told to edit in place, making a .bak file first.
You can combine any of the left-hand sides of the pipe with the right-hand sides, depending on your preference.

An alternative to sed is using rpl (e.g. available from http://rpl.sourceforge.net/ or your GNU/Linux distribution), like rpl --recursive --verbose --whole-words 'F' 'A' grades/

For convenience, I took Ulysse's answer (after correcting the undesirable error printing) and turned it into a .zshrc / .bashrc function:
function find-and-replace() {
ag -l "$1" | xargs sed -i -e s/"$1"/"$2"/g
}
Usage: find-and-replace Foo Bar

The typical (find|grep|ack|ag|rg)-xargs-sed combination has a few problems:
Difficult to remember and get correct. Eg, forgetting the xargs -r option will run the command even when no files are found, potentially causing problems.
Retrieving the file list, and the actual replacement uses different CLI tools and can have a different search behaviour.
These problems were big enough for such an invasive and dangerous operation as recursive search-and-replace, to start the development of a dedicated tool: mo.
Early tests seem to indicate that its performance is between ag and rg and it solves following problems I encounter with them:
A single invocation can filter on filename and content. Following command searches for the word bug in all source files that have a v1 indication:
mo -f 'src/.*v1.*' -p bug -w
Once the search results are OK, actual replacement for bug with fix can be added:
mo -f 'src/.*v1.*' -p bug -w -r fix

comment() {
}
doc() {
}
function agr {
doc 'usage: from=sth to=another agr [ag-args]'
comment -l --files-with-matches
ag -0 -l "$from" "${#}" | pre-files "$from" "$to"
}
pre-files() {
doc 'stdin should be null-separated list of files that need replacement; $1 the string to replace, $2 the replacement.'
comment '-i backs up original input files with the supplied extension (leave empty for no backup; needed for in-place replacement.)(do not put whitespace between -i and its arg.)'
comment '-r, --no-run-if-empty
If the standard input does not contain any nonblanks,
do not run the command. Normally, the command is run
once even if there is no input. This option is a GNU
extension.'
AGR_FROM="$1" AGR_TO="$2" xargs -r0 perl -pi.pbak -e 's/$ENV{AGR_FROM}/$ENV{AGR_TO}/g'
}
You can use it like this:
from=str1 to=sth agr path1 path2 ...
Supply no paths to make it use the current directory.
Note that ag, xargs, and perl need to be installed and on PATH.

Related

How to search and replace with egrep and sed on macOS?

I want to match a pattern in a file and replace it.
This command works with egrep, xargs and sed:
egrep -lRZ "hello" . | xargs -0 -l sed -i -e 's/hello/world/g'
The problem: It does not work on MacOS because the xargs of MacOS does not support the argumente -l.
xargs: illegal option -- l
usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr]
[-L number] [-n number [-x]] [-P maxprocs] [-s size]
[utility [argument ...]]
How is this solvable on MacOS?
There are actually three incompatibilities you're going to run into here between the GNU (Linux) vs. bsd (macOS) utilities.
The one you're getting an error message from is that bsd's xargs doesn't accept the -l option. But -l is equivalent to -L except that -L requires an argument specifying the maximum number of lines to pass per invocation of the command, while -l defaults to one if it isn't specified. Thus, you can just replace -l with -L1. -L is understood the same way by both the GNU and bsd versions of xargs, so using this is portable between Linux and macOS.
But in this particular case, there's another even easier option: sed is perfectly capable of operating on multiple files per invocation, so there's no reason to limit it to one per invocation. This'll even be slightly faster, since it doesn't have to spend as much time launching new processes. So just leave -l off.
The GNU and bsd versions of egrep (and others in the grep family) both take the option -Z, but they use it to mean completely different things. With GNU, egrep -Z prints zero bytes (ASCII NUL characters) after each filename (matching what xargs -0 expects). But with bsd, egrep -Z is equivalent to zgrep -- it treats its input files as zip archives, and expands them before searching their contents.
Fortunately, both versions understand --null to invoke zero-byte delimiters, so you can use that portably on both platforms.
Both the GNU and bsd versions understand -i<suffix> to mean "edit in place, but make a backup copy, and back up the original with the specified filename suffix". And for both of them, if the suffix is zero-length, it doesn't keep a backup. Unfortunately, the way you specify a zero-length suffix is different and (as far as I've been able to find) irreconcilably incompatible. Specifically, GNU requires the suffix to be directly attached to the -i (e.g. -i.bkp), so just specifying -i by itself is enough to specify in-place-without-backup mode. But bsd allows the suffix to be passed as a separate argument (e.g. -i .bkp), so if you just specify -i by itself, it'll use whatever the next argument is as a suffix (e.g. sed -i -e 's/hello/world/g' will use "-e" as a suffix). To specify in-place-without-backup mode, you need to follow -i with an explicit empty argument (e.g. sed -i '' -e 's/hello/world/g'). But if you do that with GNU's sed, it'll try to execute the empty argument as its script, which will fail.
With all that, here's the macOS version of your command:
egrep -lR --null "hello" . | xargs -0 sed -i '' -e 's/hello/world/g'
...which will almost work on Linux -- the only difference is that you need to remove the '' argument to sed. If you want something that's fully portable between Linux and macOS, you need to specify a backup suffix (and attach it directly to the -i option, as in -i.bkp).
The grep options to recursively search for files are best avoided - they just clutter up your grep args and make your scripts non-portable. There's already a perfectly good tool designed to find files with a very obvious name.
Are you just trying to replace hello with world in all your files? If so that's just
find . -type f |
while IFS= read -r file; do
sed 's/hello/world/g' "$file" > "tmp$$" &&
mv "tmp$$" "$file"
done
That'll work in any shell on any UNIX box unless your file names contain newlines. If you didn't want to change timestamps etc. on files that don't contain hello one way is:
find . -type f -exec grep -q 'hello' {} \; -print |
while IFS= read -r file; do
sed 's/hello/world/g' "$file" > "tmp$$" &&
mv "tmp$$" "$file"
done

How to find many files from txt file in directory and subdirectories, then copy all to new folder

I can't find posts that help with this exact problem:
On Mac Terminal I want to read a txt file (example.txt) containing file names such as:
20130815 144129 865 000000 0172 0780.bmp
20130815 144221 511 000003 1068 0408.bmp
....100 more
And I want to search for them in a certain folder/subfolders (example_folder). After each find, the file should be copied to a new folder x (new_destination).
Your help would be much appreciated!
Chers,
Mo
You could use a piped command with a combination of ls, grep, xargs and cp.
So basically you start with getting the list of files
ls
then you filter them with egrep -e, grep -e or whatever flavor of grep Mac uses for their terminal. If you want to find all files ending with text you can use the regex .txt$ (which means ends with '.txt')
ls | egrep -e "yourRegexExpression"
After that you get an input stream, but cp doesn't work with input streams and only takes a bunch of arguments, that's why we use xargs to convert it to arguments. The final step is to add the flag -t to the argument to signify that the next argument is the target directory.
ls | egrep -e "yourRegexExpression" | xargs cp -t DIRECTORY
I hope this helps!
Edit
Sorry I didn't read the question well enough, I updated to be match your problem. Here you can see that the egrep command compiles a rather large regex string with all the file names in this way (filename1|filename2|...|fileN). The $() evaluates the command inside and uses the tr to translate newLines to "|" for the regex.
ls | egrep -e "("$(cat yourtextfile.txt | tr "\n" "|")")" | xargs cp -t DIRECTORY
You could do something like:
$ for i in `cat example.txt`
find /search/path -type f -name "$i" -exec cp "{}" /new/path \;
This is how it works, for every line within example.txt:
for i in `cat example.txt`
it will try to find a file matching the line $i in the defined path:
find /search/path -type f -name "$i"
And if found it will copy it to the desired location:
-exec cp "{}" /new/path \;

Bash: recursively rename part of a file [duplicate]

I want to go through a bunch of directories and rename all files that end in _test.rb to end in _spec.rb instead. It's something I've never quite figured out how to do with bash so this time I thought I'd put some effort in to get it nailed. I've so far come up short though, my best effort is:
find spec -name "*_test.rb" -exec echo mv {} `echo {} | sed s/test/spec/` \;
NB: there's an extra echo after exec so that the command is printed instead of run while I'm testing it.
When I run it the output for each matched filename is:
mv original original
i.e. the substitution by sed has been lost. What's the trick?
To solve it in a way most close to the original problem would be probably using xargs "args per command line" option:
find . -name "*_test.rb" | sed -e "p;s/test/spec/" | xargs -n2 mv
It finds the files in the current working directory recursively, echoes the original file name (p) and then a modified name (s/test/spec/) and feeds it all to mv in pairs (xargs -n2). Beware that in this case the path itself shouldn't contain a string test.
This happens because sed receives the string {} as input, as can be verified with:
find . -exec echo `echo "{}" | sed 's/./foo/g'` \;
which prints foofoo for each file in the directory, recursively. The reason for this behavior is that the pipeline is executed once, by the shell, when it expands the entire command.
There is no way of quoting the sed pipeline in such a way that find will execute it for every file, since find doesn't execute commands via the shell and has no notion of pipelines or backquotes. The GNU findutils manual explains how to perform a similar task by putting the pipeline in a separate shell script:
#!/bin/sh
echo "$1" | sed 's/_test.rb$/_spec.rb/'
(There may be some perverse way of using sh -c and a ton of quotes to do all this in one command, but I'm not going to try.)
you might want to consider other way like
for file in $(find . -name "*_test.rb")
do
echo mv $file `echo $file | sed s/_test.rb$/_spec.rb/`
done
I find this one shorter
find . -name '*_test.rb' -exec bash -c 'echo mv $0 ${0/test.rb/spec.rb}' {} \;
You can do it without sed, if you want:
for i in `find -name '*_test.rb'` ; do mv $i ${i%%_test.rb}_spec.rb ; done
${var%%suffix} strips suffix from the value of var.
or, to do it using sed:
for i in `find -name '*_test.rb'` ; do mv $i `echo $i | sed 's/test/spec/'` ; done
You mention that you are using bash as your shell, in which case you don't actually need find and sed to achieve the batch renaming you're after...
Assuming you are using bash as your shell:
$ echo $SHELL
/bin/bash
$ _
... and assuming you have enabled the so-called globstar shell option:
$ shopt -p globstar
shopt -s globstar
$ _
... and finally assuming you have installed the rename utility (found in the util-linux-ng package)
$ which rename
/usr/bin/rename
$ _
... then you can achieve the batch renaming in a bash one-liner as follows:
$ rename _test _spec **/*_test.rb
(the globstar shell option will ensure that bash finds all matching *_test.rb files, no matter how deeply they are nested in the directory hierarchy... use help shopt to find out how to set the option)
The easiest way:
find . -name "*_test.rb" | xargs rename s/_test/_spec/
The fastest way (assuming you have 4 processors):
find . -name "*_test.rb" | xargs -P 4 rename s/_test/_spec/
If you have a large number of files to process, it is possible that the list of filenames piped to xargs would cause the resulting command line to exceed the maximum length allowed.
You can check your system's limit using getconf ARG_MAX
On most linux systems you can use free -b or cat /proc/meminfo to find how much RAM you have to work with; Otherwise, use top or your systems activity monitor app.
A safer way (assuming you have 1000000 bytes of ram to work with):
find . -name "*_test.rb" | xargs -s 1000000 rename s/_test/_spec/
Here is what worked for me when the file names had spaces in them. The example below recursively renames all .dar files to .zip files:
find . -name "*.dar" -exec bash -c 'mv "$0" "`echo \"$0\" | sed s/.dar/.zip/`"' {} \;
For this you don't need sed. You can perfectly get alone with a while loop fed with the result of find through a process substitution.
So if you have a find expression that selects the needed files, then use the syntax:
while IFS= read -r file; do
echo "mv $file ${file%_test.rb}_spec.rb" # remove "echo" when OK!
done < <(find -name "*_test.rb")
This will find files and rename all of them striping the string _test.rb from the end and appending _spec.rb.
For this step we use Shell Parameter Expansion where ${var%string} removes the shortest matching pattern "string" from $var.
$ file="HELLOa_test.rbBYE_test.rb"
$ echo "${file%_test.rb}" # remove _test.rb from the end
HELLOa_test.rbBYE
$ echo "${file%_test.rb}_spec.rb" # remove _test.rb and append _spec.rb
HELLOa_test.rbBYE_spec.rb
See an example:
$ tree
.
├── ab_testArb
├── a_test.rb
├── a_test.rb_test.rb
├── b_test.rb
├── c_test.hello
├── c_test.rb
└── mydir
└── d_test.rb
$ while IFS= read -r file; do echo "mv $file ${file/_test.rb/_spec.rb}"; done < <(find -name "*_test.rb")
mv ./b_test.rb ./b_spec.rb
mv ./mydir/d_test.rb ./mydir/d_spec.rb
mv ./a_test.rb ./a_spec.rb
mv ./c_test.rb ./c_spec.rb
if you have Ruby (1.9+)
ruby -e 'Dir["**/*._test.rb"].each{|x|test(?f,x) and File.rename(x,x.gsub(/_test/,"_spec") ) }'
In ramtam's answer which I like, the find portion works OK but the remainder does not if the path has spaces. I am not too familiar with sed, but I was able to modify that answer to:
find . -name "*_test.rb" | perl -pe 's/^((.*_)test.rb)$/"\1" "\2spec.rb"/' | xargs -n2 mv
I really needed a change like this because in my use case the final command looks more like
find . -name "olddir" | perl -pe 's/^((.*)olddir)$/"\1" "\2new directory"/' | xargs -n2 mv
I haven't the heart to do it all over again, but I wrote this in answer to Commandline Find Sed Exec. There the asker wanted to know how to move an entire tree, possibly excluding a directory or two, and rename all files and directories containing the string "OLD" to instead contain "NEW".
Besides describing the how with painstaking verbosity below, this method may also be unique in that it incorporates built-in debugging. It basically doesn't do anything at all as written except compile and save to a variable all commands it believes it should do in order to perform the work requested.
It also explicitly avoids loops as much as possible. Besides the sed recursive search for more than one match of the pattern there is no other recursion as far as I know.
And last, this is entirely null delimited - it doesn't trip on any character in any filename except the null. I don't think you should have that.
By the way, this is REALLY fast. Look:
% _mvnfind() { mv -n "${1}" "${2}" && cd "${2}"
> read -r SED <<SED
> :;s|${3}\(.*/[^/]*${5}\)|${4}\1|;t;:;s|\(${5}.*\)${3}|\1${4}|;t;s|^[0-9]*[\t]\(mv.*\)${5}|\1|p
> SED
> find . -name "*${3}*" -printf "%d\tmv %P ${5} %P\000" |
> sort -zg | sed -nz ${SED} | read -r ${6}
> echo <<EOF
> Prepared commands saved in variable: ${6}
> To view do: printf ${6} | tr "\000" "\n"
> To run do: sh <<EORUN
> $(printf ${6} | tr "\000" "\n")
> EORUN
> EOF
> }
% rm -rf "${UNNECESSARY:=/any/dirs/you/dont/want/moved}"
% time ( _mvnfind ${SRC=./test_tree} ${TGT=./mv_tree} \
> ${OLD=google} ${NEW=replacement_word} ${sed_sep=SsEeDd} \
> ${sh_io:=sh_io} ; printf %b\\000 "${sh_io}" | tr "\000" "\n" \
> | wc - ; echo ${sh_io} | tr "\000" "\n" | tail -n 2 )
<actual process time used:>
0.06s user 0.03s system 106% cpu 0.090 total
<output from wc:>
Lines Words Bytes
115 362 20691 -
<output from tail:>
mv .config/replacement_word-chrome-beta/Default/.../googlestars \
.config/replacement_word-chrome-beta/Default/.../replacement_wordstars
NOTE: The above function will likely require GNU versions of sed and find to properly handle the find printf and sed -z -e and :;recursive regex test;t calls. If these are not available to you the functionality can likely be duplicated with a few minor adjustments.
This should do everything you wanted from start to finish with very little fuss. I did fork with sed, but I was also practicing some sed recursive branching techniques so that's why I'm here. It's kind of like getting a discount haircut at a barber school, I guess. Here's the workflow:
rm -rf ${UNNECESSARY}
I intentionally left out any functional call that might delete or destroy data of any kind. You mention that ./app might be unwanted. Delete it or move it elsewhere beforehand, or, alternatively, you could build in a \( -path PATTERN -exec rm -rf \{\} \) routine to find to do it programmatically, but that one's all yours.
_mvnfind "${#}"
Declare its arguments and call the worker function. ${sh_io} is especially important in that it saves the return from the function. ${sed_sep} comes in a close second; this is an arbitrary string used to reference sed's recursion in the function. If ${sed_sep} is set to a value that could potentially be found in any of your path- or file-names acted upon... well, just don't let it be.
mv -n $1 $2
The whole tree is moved from the beginning. It will save a lot of headache; believe me. The rest of what you want to do - the renaming - is simply a matter of filesystem metadata. If you were, for instance, moving this from one drive to another, or across filesystem boundaries of any kind, you're better off doing so at once with one command. It's also safer. Note the -noclobber option set for mv; as written, this function will not put ${SRC_DIR} where a ${TGT_DIR} already exists.
read -R SED <<HEREDOC
I located all of sed's commands here to save on escaping hassles and read them into a variable to feed to sed below. Explanation below.
find . -name ${OLD} -printf
We begin the find process. With find we search only for anything that needs renaming because we already did all of the place-to-place mv operations with the function's first command. Rather than take any direct action with find, like an exec call, for instance, we instead use it to build out the command-line dynamically with -printf.
%dir-depth :tab: 'mv '%path-to-${SRC}' '${sed_sep}'%path-again :null delimiter:'
After find locates the files we need it directly builds and prints out (most) of the command we'll need to process your renaming. The %dir-depth tacked onto the beginning of each line will help to ensure we're not trying to rename a file or directory in the tree with a parent object that has yet to be renamed. find uses all sorts of optimization techniques to walk your filesystem tree and it is not a sure thing that it will return the data we need in a safe-for-operations order. This is why we next...
sort -general-numerical -zero-delimited
We sort all of find's output based on %directory-depth so that the paths nearest in relationship to ${SRC} are worked first. This avoids possible errors involving mving files into non-existent locations, and it minimizes need to for recursive looping. (in fact, you might be hard-pressed to find a loop at all)
sed -ex :rcrs;srch|(save${sep}*til)${OLD}|\saved${SUBSTNEW}|;til ${OLD=0}
I think this is the only loop in the whole script, and it only loops over the second %Path printed for each string in case it contains more than one ${OLD} value that might need replacing. All other solutions I imagined involved a second sed process, and while a short loop may not be desirable, certainly it beats spawning and forking an entire process.
So basically what sed does here is search for ${sed_sep}, then, having found it, saves it and all characters it encounters until it finds ${OLD}, which it then replaces with ${NEW}. It then heads back to ${sed_sep} and looks again for ${OLD}, in case it occurs more than once in the string. If it is not found, it prints the modified string to stdout (which it then catches again next) and ends the loop.
This avoids having to parse the entire string, and ensures that the first half of the mv command string, which needs to include ${OLD} of course, does include it, and the second half is altered as many times as is necessary to wipe the ${OLD} name from mv's destination path.
sed -ex...-ex search|%dir_depth(save*)${sed_sep}|(only_saved)|out
The two -exec calls here happen without a second fork. In the first, as we've seen, we modify the mv command as supplied by find's -printf function command as necessary to properly alter all references of ${OLD} to ${NEW}, but in order to do so we had to use some arbitrary reference points which should not be included in the final output. So once sed finishes all it needs to do, we instruct it to wipe out its reference points from the hold-buffer before passing it along.
AND NOW WE'RE BACK AROUND
read will receive a command that looks like this:
% mv /path2/$SRC/$OLD_DIR/$OLD_FILE /same/path_w/$NEW_DIR/$NEW_FILE \000
It will read it into ${msg} as ${sh_io} which can be examined at will outside of the function.
Cool.
-Mike
I was able handle filenames with spaces by following the examples suggested by onitake.
This doesn't break if the path contains spaces or the string test:
find . -name "*_test.rb" -print0 | while read -d $'\0' file
do
echo mv "$file" "$(echo $file | sed s/test/spec/)"
done
This is an example that should work in all cases.
Works recursiveley, Need just shell, and support files names with spaces.
find spec -name "*_test.rb" -print0 | while read -d $'\0' file; do mv "$file" "`echo $file | sed s/test/spec/`"; done
$ find spec -name "*_test.rb"
spec/dir2/a_test.rb
spec/dir1/a_test.rb
$ find spec -name "*_test.rb" | xargs -n 1 /usr/bin/perl -e '($new=$ARGV[0]) =~ s/test/spec/; system(qq(mv),qq(-v), $ARGV[0], $new);'
`spec/dir2/a_test.rb' -> `spec/dir2/a_spec.rb'
`spec/dir1/a_test.rb' -> `spec/dir1/a_spec.rb'
$ find spec -name "*_spec.rb"
spec/dir2/b_spec.rb
spec/dir2/a_spec.rb
spec/dir1/a_spec.rb
spec/dir1/c_spec.rb
Your question seems to be about sed, but to accomplish your goal of recursive rename, I'd suggest the following, shamelessly ripped from another answer I gave here:recursive rename in bash
#!/bin/bash
IFS=$'\n'
function RecurseDirs
{
for f in "$#"
do
newf=echo "${f}" | sed -e 's/^(.*_)test.rb$/\1spec.rb/g'
echo "${f}" "${newf}"
mv "${f}" "${newf}"
f="${newf}"
if [[ -d "${f}" ]]; then
cd "${f}"
RecurseDirs $(ls -1 ".")
fi
done
cd ..
}
RecurseDirs .
More secure way of doing rename with find utils and sed regular expression type:
mkdir ~/practice
cd ~/practice
touch classic.txt.txt
touch folk.txt.txt
Remove the ".txt.txt" extension as follows -
cd ~/practice
find . -name "*txt" -execdir sh -c 'mv "$0" `echo "$0" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'`' {} \;
If you use the + in place of ; in order to work on batch mode, the above command will rename only the first matching file, but not the entire list of file matches by 'find'.
find . -name "*txt" -execdir sh -c 'mv "$0" `echo "$0" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'`' {} +
Here's a nice oneliner that does the trick.
Sed can't handle this right, especially if multiple variables are passed by xargs with -n 2.
A bash substition would handle this easily like:
find ./spec -type f -name "*_test.rb" -print0 | xargs -0 -I {} sh -c 'export file={}; mv $file ${file/_test.rb/_spec.rb}'
Adding -type -f will limit the move operations to files only, -print 0 will handle empty spaces in paths.
I share this post as it is a bit related to question. Sorry for not providing more details. Hope it helps someone else.
http://www.peteryu.ca/tutorials/shellscripting/batch_rename
This is my working solution:
for FILE in {{FILE_PATTERN}}; do echo ${FILE} | mv ${FILE} $(sed 's/{{SOURCE_PATTERN}}/{{TARGET_PATTERN}}/g'); done

Scaling up grep find and copy to large folder (xargs?)

I would like to search a directory for any file that matches any of a list of words. If a file matches, I would like to copy that file into a new directory. I created a small batch of test files and got the following code working:
cp `grep -lir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation'` '/Users/newlocation'
Unfortunately, when I run this code on a large folder with a few thousand files it says the argument list is too long for cp. I think I need to loop this or use a xargs but I can't figure out how to make the conversion.
The minimal change from what you have would be:
grep -lir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs cp -t '/Users/newlocation'
But, don't use that. Because you never know when you will encounter a filename with spaces or newlines in it, null-terminated strings should be used. On linux/GNU, add the -Z option to grep and -0 to xargs:
grep -Zlir 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs -0 cp -t '/Users/newlocation'
On Macs (and AIX, HP-UX, Solaris, *BSD), the grep options change slightly but, more importantly, the GNU cp -t option is not available. A workaround is:
grep -lir --null 'word\|word2\|word3\|word4\|word5' '/Users/originallocation' | \
xargs -0 -I fname cp fname '/Users/newlocation'
This is less efficient because a new instance of cp has to be run for each file to be copied.
Alternative solution for those without grep -r. Using find + egrep + xargs , hope there is no file with same file name in different folders. Secondly, I replaced the ugly style of word\|word2\|word3\|word4\|word5
find . -type f -exec egrep -l 'word|word2|word3|word4|word5' {} \; |xargs -i cp {} /LARGE_FOLDER

Recursively rename files using find and sed

I want to go through a bunch of directories and rename all files that end in _test.rb to end in _spec.rb instead. It's something I've never quite figured out how to do with bash so this time I thought I'd put some effort in to get it nailed. I've so far come up short though, my best effort is:
find spec -name "*_test.rb" -exec echo mv {} `echo {} | sed s/test/spec/` \;
NB: there's an extra echo after exec so that the command is printed instead of run while I'm testing it.
When I run it the output for each matched filename is:
mv original original
i.e. the substitution by sed has been lost. What's the trick?
To solve it in a way most close to the original problem would be probably using xargs "args per command line" option:
find . -name "*_test.rb" | sed -e "p;s/test/spec/" | xargs -n2 mv
It finds the files in the current working directory recursively, echoes the original file name (p) and then a modified name (s/test/spec/) and feeds it all to mv in pairs (xargs -n2). Beware that in this case the path itself shouldn't contain a string test.
This happens because sed receives the string {} as input, as can be verified with:
find . -exec echo `echo "{}" | sed 's/./foo/g'` \;
which prints foofoo for each file in the directory, recursively. The reason for this behavior is that the pipeline is executed once, by the shell, when it expands the entire command.
There is no way of quoting the sed pipeline in such a way that find will execute it for every file, since find doesn't execute commands via the shell and has no notion of pipelines or backquotes. The GNU findutils manual explains how to perform a similar task by putting the pipeline in a separate shell script:
#!/bin/sh
echo "$1" | sed 's/_test.rb$/_spec.rb/'
(There may be some perverse way of using sh -c and a ton of quotes to do all this in one command, but I'm not going to try.)
you might want to consider other way like
for file in $(find . -name "*_test.rb")
do
echo mv $file `echo $file | sed s/_test.rb$/_spec.rb/`
done
I find this one shorter
find . -name '*_test.rb' -exec bash -c 'echo mv $0 ${0/test.rb/spec.rb}' {} \;
You can do it without sed, if you want:
for i in `find -name '*_test.rb'` ; do mv $i ${i%%_test.rb}_spec.rb ; done
${var%%suffix} strips suffix from the value of var.
or, to do it using sed:
for i in `find -name '*_test.rb'` ; do mv $i `echo $i | sed 's/test/spec/'` ; done
You mention that you are using bash as your shell, in which case you don't actually need find and sed to achieve the batch renaming you're after...
Assuming you are using bash as your shell:
$ echo $SHELL
/bin/bash
$ _
... and assuming you have enabled the so-called globstar shell option:
$ shopt -p globstar
shopt -s globstar
$ _
... and finally assuming you have installed the rename utility (found in the util-linux-ng package)
$ which rename
/usr/bin/rename
$ _
... then you can achieve the batch renaming in a bash one-liner as follows:
$ rename _test _spec **/*_test.rb
(the globstar shell option will ensure that bash finds all matching *_test.rb files, no matter how deeply they are nested in the directory hierarchy... use help shopt to find out how to set the option)
The easiest way:
find . -name "*_test.rb" | xargs rename s/_test/_spec/
The fastest way (assuming you have 4 processors):
find . -name "*_test.rb" | xargs -P 4 rename s/_test/_spec/
If you have a large number of files to process, it is possible that the list of filenames piped to xargs would cause the resulting command line to exceed the maximum length allowed.
You can check your system's limit using getconf ARG_MAX
On most linux systems you can use free -b or cat /proc/meminfo to find how much RAM you have to work with; Otherwise, use top or your systems activity monitor app.
A safer way (assuming you have 1000000 bytes of ram to work with):
find . -name "*_test.rb" | xargs -s 1000000 rename s/_test/_spec/
Here is what worked for me when the file names had spaces in them. The example below recursively renames all .dar files to .zip files:
find . -name "*.dar" -exec bash -c 'mv "$0" "`echo \"$0\" | sed s/.dar/.zip/`"' {} \;
For this you don't need sed. You can perfectly get alone with a while loop fed with the result of find through a process substitution.
So if you have a find expression that selects the needed files, then use the syntax:
while IFS= read -r file; do
echo "mv $file ${file%_test.rb}_spec.rb" # remove "echo" when OK!
done < <(find -name "*_test.rb")
This will find files and rename all of them striping the string _test.rb from the end and appending _spec.rb.
For this step we use Shell Parameter Expansion where ${var%string} removes the shortest matching pattern "string" from $var.
$ file="HELLOa_test.rbBYE_test.rb"
$ echo "${file%_test.rb}" # remove _test.rb from the end
HELLOa_test.rbBYE
$ echo "${file%_test.rb}_spec.rb" # remove _test.rb and append _spec.rb
HELLOa_test.rbBYE_spec.rb
See an example:
$ tree
.
├── ab_testArb
├── a_test.rb
├── a_test.rb_test.rb
├── b_test.rb
├── c_test.hello
├── c_test.rb
└── mydir
└── d_test.rb
$ while IFS= read -r file; do echo "mv $file ${file/_test.rb/_spec.rb}"; done < <(find -name "*_test.rb")
mv ./b_test.rb ./b_spec.rb
mv ./mydir/d_test.rb ./mydir/d_spec.rb
mv ./a_test.rb ./a_spec.rb
mv ./c_test.rb ./c_spec.rb
if you have Ruby (1.9+)
ruby -e 'Dir["**/*._test.rb"].each{|x|test(?f,x) and File.rename(x,x.gsub(/_test/,"_spec") ) }'
In ramtam's answer which I like, the find portion works OK but the remainder does not if the path has spaces. I am not too familiar with sed, but I was able to modify that answer to:
find . -name "*_test.rb" | perl -pe 's/^((.*_)test.rb)$/"\1" "\2spec.rb"/' | xargs -n2 mv
I really needed a change like this because in my use case the final command looks more like
find . -name "olddir" | perl -pe 's/^((.*)olddir)$/"\1" "\2new directory"/' | xargs -n2 mv
I haven't the heart to do it all over again, but I wrote this in answer to Commandline Find Sed Exec. There the asker wanted to know how to move an entire tree, possibly excluding a directory or two, and rename all files and directories containing the string "OLD" to instead contain "NEW".
Besides describing the how with painstaking verbosity below, this method may also be unique in that it incorporates built-in debugging. It basically doesn't do anything at all as written except compile and save to a variable all commands it believes it should do in order to perform the work requested.
It also explicitly avoids loops as much as possible. Besides the sed recursive search for more than one match of the pattern there is no other recursion as far as I know.
And last, this is entirely null delimited - it doesn't trip on any character in any filename except the null. I don't think you should have that.
By the way, this is REALLY fast. Look:
% _mvnfind() { mv -n "${1}" "${2}" && cd "${2}"
> read -r SED <<SED
> :;s|${3}\(.*/[^/]*${5}\)|${4}\1|;t;:;s|\(${5}.*\)${3}|\1${4}|;t;s|^[0-9]*[\t]\(mv.*\)${5}|\1|p
> SED
> find . -name "*${3}*" -printf "%d\tmv %P ${5} %P\000" |
> sort -zg | sed -nz ${SED} | read -r ${6}
> echo <<EOF
> Prepared commands saved in variable: ${6}
> To view do: printf ${6} | tr "\000" "\n"
> To run do: sh <<EORUN
> $(printf ${6} | tr "\000" "\n")
> EORUN
> EOF
> }
% rm -rf "${UNNECESSARY:=/any/dirs/you/dont/want/moved}"
% time ( _mvnfind ${SRC=./test_tree} ${TGT=./mv_tree} \
> ${OLD=google} ${NEW=replacement_word} ${sed_sep=SsEeDd} \
> ${sh_io:=sh_io} ; printf %b\\000 "${sh_io}" | tr "\000" "\n" \
> | wc - ; echo ${sh_io} | tr "\000" "\n" | tail -n 2 )
<actual process time used:>
0.06s user 0.03s system 106% cpu 0.090 total
<output from wc:>
Lines Words Bytes
115 362 20691 -
<output from tail:>
mv .config/replacement_word-chrome-beta/Default/.../googlestars \
.config/replacement_word-chrome-beta/Default/.../replacement_wordstars
NOTE: The above function will likely require GNU versions of sed and find to properly handle the find printf and sed -z -e and :;recursive regex test;t calls. If these are not available to you the functionality can likely be duplicated with a few minor adjustments.
This should do everything you wanted from start to finish with very little fuss. I did fork with sed, but I was also practicing some sed recursive branching techniques so that's why I'm here. It's kind of like getting a discount haircut at a barber school, I guess. Here's the workflow:
rm -rf ${UNNECESSARY}
I intentionally left out any functional call that might delete or destroy data of any kind. You mention that ./app might be unwanted. Delete it or move it elsewhere beforehand, or, alternatively, you could build in a \( -path PATTERN -exec rm -rf \{\} \) routine to find to do it programmatically, but that one's all yours.
_mvnfind "${#}"
Declare its arguments and call the worker function. ${sh_io} is especially important in that it saves the return from the function. ${sed_sep} comes in a close second; this is an arbitrary string used to reference sed's recursion in the function. If ${sed_sep} is set to a value that could potentially be found in any of your path- or file-names acted upon... well, just don't let it be.
mv -n $1 $2
The whole tree is moved from the beginning. It will save a lot of headache; believe me. The rest of what you want to do - the renaming - is simply a matter of filesystem metadata. If you were, for instance, moving this from one drive to another, or across filesystem boundaries of any kind, you're better off doing so at once with one command. It's also safer. Note the -noclobber option set for mv; as written, this function will not put ${SRC_DIR} where a ${TGT_DIR} already exists.
read -R SED <<HEREDOC
I located all of sed's commands here to save on escaping hassles and read them into a variable to feed to sed below. Explanation below.
find . -name ${OLD} -printf
We begin the find process. With find we search only for anything that needs renaming because we already did all of the place-to-place mv operations with the function's first command. Rather than take any direct action with find, like an exec call, for instance, we instead use it to build out the command-line dynamically with -printf.
%dir-depth :tab: 'mv '%path-to-${SRC}' '${sed_sep}'%path-again :null delimiter:'
After find locates the files we need it directly builds and prints out (most) of the command we'll need to process your renaming. The %dir-depth tacked onto the beginning of each line will help to ensure we're not trying to rename a file or directory in the tree with a parent object that has yet to be renamed. find uses all sorts of optimization techniques to walk your filesystem tree and it is not a sure thing that it will return the data we need in a safe-for-operations order. This is why we next...
sort -general-numerical -zero-delimited
We sort all of find's output based on %directory-depth so that the paths nearest in relationship to ${SRC} are worked first. This avoids possible errors involving mving files into non-existent locations, and it minimizes need to for recursive looping. (in fact, you might be hard-pressed to find a loop at all)
sed -ex :rcrs;srch|(save${sep}*til)${OLD}|\saved${SUBSTNEW}|;til ${OLD=0}
I think this is the only loop in the whole script, and it only loops over the second %Path printed for each string in case it contains more than one ${OLD} value that might need replacing. All other solutions I imagined involved a second sed process, and while a short loop may not be desirable, certainly it beats spawning and forking an entire process.
So basically what sed does here is search for ${sed_sep}, then, having found it, saves it and all characters it encounters until it finds ${OLD}, which it then replaces with ${NEW}. It then heads back to ${sed_sep} and looks again for ${OLD}, in case it occurs more than once in the string. If it is not found, it prints the modified string to stdout (which it then catches again next) and ends the loop.
This avoids having to parse the entire string, and ensures that the first half of the mv command string, which needs to include ${OLD} of course, does include it, and the second half is altered as many times as is necessary to wipe the ${OLD} name from mv's destination path.
sed -ex...-ex search|%dir_depth(save*)${sed_sep}|(only_saved)|out
The two -exec calls here happen without a second fork. In the first, as we've seen, we modify the mv command as supplied by find's -printf function command as necessary to properly alter all references of ${OLD} to ${NEW}, but in order to do so we had to use some arbitrary reference points which should not be included in the final output. So once sed finishes all it needs to do, we instruct it to wipe out its reference points from the hold-buffer before passing it along.
AND NOW WE'RE BACK AROUND
read will receive a command that looks like this:
% mv /path2/$SRC/$OLD_DIR/$OLD_FILE /same/path_w/$NEW_DIR/$NEW_FILE \000
It will read it into ${msg} as ${sh_io} which can be examined at will outside of the function.
Cool.
-Mike
I was able handle filenames with spaces by following the examples suggested by onitake.
This doesn't break if the path contains spaces or the string test:
find . -name "*_test.rb" -print0 | while read -d $'\0' file
do
echo mv "$file" "$(echo $file | sed s/test/spec/)"
done
This is an example that should work in all cases.
Works recursiveley, Need just shell, and support files names with spaces.
find spec -name "*_test.rb" -print0 | while read -d $'\0' file; do mv "$file" "`echo $file | sed s/test/spec/`"; done
$ find spec -name "*_test.rb"
spec/dir2/a_test.rb
spec/dir1/a_test.rb
$ find spec -name "*_test.rb" | xargs -n 1 /usr/bin/perl -e '($new=$ARGV[0]) =~ s/test/spec/; system(qq(mv),qq(-v), $ARGV[0], $new);'
`spec/dir2/a_test.rb' -> `spec/dir2/a_spec.rb'
`spec/dir1/a_test.rb' -> `spec/dir1/a_spec.rb'
$ find spec -name "*_spec.rb"
spec/dir2/b_spec.rb
spec/dir2/a_spec.rb
spec/dir1/a_spec.rb
spec/dir1/c_spec.rb
Your question seems to be about sed, but to accomplish your goal of recursive rename, I'd suggest the following, shamelessly ripped from another answer I gave here:recursive rename in bash
#!/bin/bash
IFS=$'\n'
function RecurseDirs
{
for f in "$#"
do
newf=echo "${f}" | sed -e 's/^(.*_)test.rb$/\1spec.rb/g'
echo "${f}" "${newf}"
mv "${f}" "${newf}"
f="${newf}"
if [[ -d "${f}" ]]; then
cd "${f}"
RecurseDirs $(ls -1 ".")
fi
done
cd ..
}
RecurseDirs .
More secure way of doing rename with find utils and sed regular expression type:
mkdir ~/practice
cd ~/practice
touch classic.txt.txt
touch folk.txt.txt
Remove the ".txt.txt" extension as follows -
cd ~/practice
find . -name "*txt" -execdir sh -c 'mv "$0" `echo "$0" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'`' {} \;
If you use the + in place of ; in order to work on batch mode, the above command will rename only the first matching file, but not the entire list of file matches by 'find'.
find . -name "*txt" -execdir sh -c 'mv "$0" `echo "$0" | sed -r 's/\.[[:alnum:]]+\.[[:alnum:]]+$//'`' {} +
Here's a nice oneliner that does the trick.
Sed can't handle this right, especially if multiple variables are passed by xargs with -n 2.
A bash substition would handle this easily like:
find ./spec -type f -name "*_test.rb" -print0 | xargs -0 -I {} sh -c 'export file={}; mv $file ${file/_test.rb/_spec.rb}'
Adding -type -f will limit the move operations to files only, -print 0 will handle empty spaces in paths.
I share this post as it is a bit related to question. Sorry for not providing more details. Hope it helps someone else.
http://www.peteryu.ca/tutorials/shellscripting/batch_rename
This is my working solution:
for FILE in {{FILE_PATTERN}}; do echo ${FILE} | mv ${FILE} $(sed 's/{{SOURCE_PATTERN}}/{{TARGET_PATTERN}}/g'); done

Resources