Using regular expressions to get parts of file name

Using regular expressions to get parts of file name - macos

On Mac (OS X) I have a directory with many images named like this:
IMG_250x333_1.jpg
IMG_250x333_2.jpg
IMG_250x333_3.jpg
...
I need to rename all of them to:
IMG_1.jpg
IMG_2.jpg
IMG_3.jpg
...
I think using a UNIX command line with "mv" and a kind of regex would do the job, but I don't know how! Can someone please help?
Thanks!

What happens if there's a IMG_111x333_1.jpg and also a IMG_444x222_1.jpg? You risk mangling/overwriting something...
But if that is what you want, you can do it like this:
#!/bin/bash
for f in *.jpg; do
new=${f/_*_/_}
echo mv "$f" $new
done
If you like what it is doing, remove the word echo.

Here's an approach I like:
ls | sed 's/\(.*\)250x333_\(.*\)/mv "&" "\1\2"/' | sh
List the files with ls.
Then, transform the filenames with sed, and generate a mv command. Note that the & in the sed command outputs the full input string.
Finally, evaluate the mv command with sh
The nice thing about this approach is you can remove | sh and test that your regex is correct.

Related

substitute file names using rename

I want to rename files names by substituting all the characters starting from "_ " followed by eight capital letter and keep only the extension.
4585_10_148_H2A119Ub_GTCTGTCA_S51_mcdf_mdup_ngsFlt.fm
4585_10_148_H3K27me3_TCTTCACA_S51_mcdf_mdup_ngsFlt.fm
4585_27_128_Bap1_Bethyl_ACAGATTC_S61_mcdf_mdup_ngsFlt.fw
4585_32_148_1_INPUT_previous_AGAGTCAA_S72_mcdf_mdup_ngsFlt.bw
expected output
4585_10_148_H2A119Ub.fm
4585_10_148_H3K27me3.fm
4585_27_128_Bap1_Bethyl.fm
4585_32_148_1_INPUT_previous.fm

Try this:
for f in *; do
target=$(echo "${f}" | sed -E 's/_[[:upper:]]{8}.*\././')
mv "${f}" "${target}"
done
The key thing is the -E argument to sed, since it enables expanded regular expressions.

You can also use rename (a.k.a. prename or Perl rename) like this:
rename --dry-run 's|_[[:upper:]]{8}.*\.|.|' *
Sample Output
'4585_10_148_H2A119Ub_GTCTGTCA_S51_mcdf_mdup_ngsFlt.fm' would be renamed to '4585_10_148_H2A119Ub.fm'
'4585_32_148_1_INPUT_previous_AGAGTCAA_S72_mcdf_mdup_ngsFlt.bw' would be renamed to '4585_32_148_1_INPUT_previous.bw'
Remove the --dry-run and run again for real, if the output looks good.
This has several added benefits:
that it will warn and avoid any conflicts if two files rename to the same thing,
that it can rename across directories, creating any necessary intermediate directories on the way,
that you can do a dry run first to test it,
that you can use arbitrarily complex Perl code to specify the new name.
On a Mac, install it with homebrew using:
brew install rename

You may try this.
for i in *.fm; do mv $i $(echo $i | sed 's/_GTCTGTCA_S51_mcdf_mdup_ngsFlt//g'); done;
for i in *.fm; do mv $i $(echo $i | sed 's/_TCTTCACA_S51_mcdf_mdup_ngsFlt//g'); done;

Removing last n characters from Unix Filename before the extension

I have a bunch of files in Unix Directory :
test_XXXXX.txt
best_YYY.txt
nest_ZZZZZZZZZ.txt
I need to rename these files as
test.txt
best.txt
nest.txt
I am using Ksh on AIX .Please let me know how i can accomplish the above using a Single command .
Thanks,

In this case, it seems you have an _ to start every section you want to remove. If that's the case, then this ought to work:
for f in *.txt
do
g="${f%%_*}.txt"
echo mv "${f}" "${g}"
done
Remove the echo if the output seems correct, or replace the last line with done | ksh.
If the files aren't all .txt files, this is a little more general:
for f in *
do
ext="${f##*.}"
g="${f%%_*}.${ext}"
echo mv "${f}" "${g}"
done

If this is a one time (or not very often) occasion, I would create a script with
$ ls > rename.sh
$ vi rename.sh
:%s/\(.*\)/mv \1 \1/
(edit manually to remove all the XXXXX from the second file names)
:x
$ source rename.sh
If this need occurs frequently, I would need more insight into what XXXXX, YYY, and ZZZZZZZZZZZ are.
Addendum
Modify this to your liking:
ls | sed "{s/\(.*\)\(............\)\.txt$/mv \1\2.txt \1.txt/}" | sh
It transforms filenames by omitting 12 characters before .txt and passing the resulting mv command to a shell.
Beware: If there are non-matching filenames, it executes the filename—and not a mv command. I omitted a way to select only matching filenames.

Changing file extensions for all files in a directory on OS X

I have a directory full of files with one extension (.txt in this case) that I want to automatically convert to another extension (.md).
Is there an easy terminal one-liner I can use to convert all of the files in this directory to a different file extension?
Or do I need to write a script with a regular expression?

You could use something like this:
for old in *.txt; do mv $old `basename $old .txt`.md; done
Make a copy first!

Alternatively, you could install the ren (rename) utility
brew install ren
ren '*.txt' '#1.md'
If you want to rename files with prefix or suffix in file names
ren 'prefix_*.txt' 'prefix_#1.md'

Terminal is not necessary for this... Just highlight all of the files you want to rename. Right click and select "Rename ## items" and just type ".txt" into to the "Find:" box and ".md" into the "Replace with:" box.

The preferred Unix way to do this (yes, OS X is based on Unix) is:
ls | sed 's/^$.*$\.txt$/mv "\1.txt" "\1.md"/' | sh
Why looping with for if ls by design loops through the whole list of filenames? You've got pipes, use them. You can create/modify not only output using commands, but also commands (right, that is commands created by a command, which is what Brian Kernighan, one of the inventors of Unix, liked most on Unix), so let's take a look what the ls and the sed produces by removing the pipe to sh:
$ ls | sed 's/^$.*$\.txt$/mv "\1.txt" "\1.md"/'
mv "firstfile.txt" "firstfile.md"
mv "second file.txt" "second file.md"
$
As you can see, it is not only an one-liner, but a complete script, which furthermore works by creating another script as output. So let's just feed the script produced by the one-liner script to sh, which is the script interpreter of OS X. Of course it works even for filenames with spaces in it.
BTW: Every time you type something in Terminal you create a script, even if it is only a single command with one word like ls or date etc. Everything running in a Unix shell is always a script/program, which is just some ASCII-based stream (in this case an instruction stream opposed to a data stream).
To see the actual commands being executed by sh, just add an -x option after sh, which turns on debugging output in the shell, so you will see every mv command being executed with the actual arguments passed by the sed editor script (yeah, another script inside the script :-) ).
However, if you like complexity, you can even use awk and if you like to install other programs to just do basic work, there is ren. I know even people who would prefer to write a 50-lines or so perl script for this simple every-day task.
Maybe it's easier in finder to rename files, but if connected remotely to a Mac (e.g. via ssh), using finder is not possible at all. That's why cmd line still is very useful.

Based on the selected and most accurate answer above, here's a bash function for reusability:
function change_all_extensions() {
for old in *."$1"; do mv $old `basename $old ."$1"`."$2"; done
}
Usage:
$ change_all_extensions txt md
(I couldn't figure out how to get clean code formatting in a comment on that answer.)

No need to write a script for it just hit this command
find ./ -name "*.txt" | xargs -I '{}' basename '{}' | sed 's/\.txt//' | xargs -I '{}' mv '{}.txt' '{}.md'

You do not need a terminal for this one; here is a sample demonstration in MacOS Big Sur.
Select all the files, right-click and select "rename..."
Add the existing file extension in "Find" and the extension you want to replace with "Replace with".
And done!

I had a similar problem where files were named .gifx.gif at the end and this worked in OS X to remove the last .gif:
for old in *.gifx.gif; do
mv $(echo "$old") $(echo "$old" | sed 's/x.gif//');
done

cd $YOUR_DIR
ls *.txt > abc
mkdir target // say i want to move it to another directory target in this case
while read line
do
file=$(echo $line |awk -F. '{ print $1 }')
cp $line target/$file.md // depends if u want to move(mv) or copy(cp)
done < abc

list=ls
for file in $list
do
newf=echo $file|cut -f1 -d'.'
echo "The newf is $newf"
mv $file $newf.jpg
done

Using sed to mass rename files

Objective
Change these filenames:
F00001-0708-RG-biasliuyda
F00001-0708-CS-akgdlaul
F00001-0708-VF-hioulgigl
to these filenames:
F0001-0708-RG-biasliuyda
F0001-0708-CS-akgdlaul
F0001-0708-VF-hioulgigl
Shell Code
To test:
ls F00001-0708-*|sed 's/\(.\).\(.*\)/mv & \1\2/'
To perform:
ls F00001-0708-*|sed 's/\(.\).\(.*\)/mv & \1\2/' | sh
My Question
I don't understand the sed code. I understand what the substitution
command
$ sed 's/something/mv'
means. And I understand regular expressions somewhat. But I don't
understand what's happening here:
\(.\).\(.*\)
or here:
& \1\2/
The former, to me, just looks like it means: "a single character,
followed by a single character, followed by any length sequence of a
single character"--but surely there's more to it than that. As far as
the latter part:
& \1\2/
I have no idea.

First, I should say that the easiest way to do this is to use the
prename or rename commands.
On Ubuntu, OSX (Homebrew package rename, MacPorts package p5-file-rename), or other systems with perl rename (prename):
rename s/0000/000/ F0000*
or on systems with rename from util-linux-ng, such as RHEL:
rename 0000 000 F0000*
That's a lot more understandable than the equivalent sed command.
But as for understanding the sed command, the sed manpage is helpful. If
you run man sed and search for & (using the / command to search),
you'll find it's a special character in s/foo/bar/ replacements.
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success‐
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
Therefore, \(.\) matches the first character, which can be referenced by \1.
Then . matches the next character, which is always 0.
Then \(.*\) matches the rest of the filename, which can be referenced by \2.
The replacement string puts it all together using & (the original
filename) and \1\2 which is every part of the filename except the 2nd
character, which was a 0.
This is a pretty cryptic way to do this, IMHO. If for
some reason the rename command was not available and you wanted to use
sed to do the rename (or perhaps you were doing something too complex
for rename?), being more explicit in your regex would make it much
more readable. Perhaps something like:
ls F00001-0708-*|sed 's/F0000\(.*\)/mv & F000\1/' | sh
Being able to see what's actually changing in the
s/search/replacement/ makes it much more readable. Also it won't keep
sucking characters out of your filename if you accidentally run it
twice or something.

you've had your sed explanation, now you can use just the shell, no need external commands
for file in F0000*
do
echo mv "$file" "${file/#F0000/F000}"
# ${file/#F0000/F000} means replace the pattern that starts at beginning of string
done

I wrote a small post with examples on batch renaming using sed couple of years ago:
http://www.guyrutenberg.com/2009/01/12/batch-renaming-using-sed/
For example:
for i in *; do
mv "$i" "`echo $i | sed "s/regex/replace_text/"`";
done
If the regex contains groups (e.g. \(subregex\) then you can use them in the replacement text as \1\,\2 etc.

The easiest way would be:
for i in F00001*; do mv "$i" "${i/F00001/F0001}"; done
or, portably,
for i in F00001*; do mv "$i" "F0001${i#F00001}"; done
This replaces the F00001 prefix in the filenames with F0001.
credits to mahesh here: http://www.debian-administration.org/articles/150

The sed command
s/\(.\).\(.*\)/mv & \1\2/
means to replace:
\(.\).\(.*\)
with:
mv & \1\2
just like a regular sed command. However, the parentheses, & and \n markers change it a little.
The search string matches (and remembers as pattern 1) the single character at the start, followed by a single character, follwed by the rest of the string (remembered as pattern 2).
In the replacement string, you can refer to these matched patterns to use them as part of the replacement. You can also refer to the whole matched portion as &.
So what that sed command is doing is creating a mv command based on the original file (for the source) and character 1 and 3 onwards, effectively removing character 2 (for the destination). It will give you a series of lines along the following format:
mv F00001-0708-RG-biasliuyda F0001-0708-RG-biasliuyda
mv abcdef acdef
and so on.

Using perl rename (a must have in the toolbox):
rename -n 's/0000/000/' F0000*
Remove -n switch when the output looks good to rename for real.
There are other tools with the same name which may or may not be able to do this, so be careful.
The rename command that is part of the util-linux package, won't.
If you run the following command (GNU)
$ rename
and you see perlexpr, then this seems to be the right tool.
If not, to make it the default (usually already the case) on Debian and derivative like Ubuntu :
$ sudo apt install rename
$ sudo update-alternatives --set rename /usr/bin/file-rename
For archlinux:
pacman -S perl-rename
For RedHat-family distros:
yum install prename
The 'prename' package is in the EPEL repository.
For Gentoo:
emerge dev-perl/rename
For *BSD:
pkg install gprename
or p5-File-Rename
For Mac users:
brew install rename
If you don't have this command with another distro, search your package manager to install it or do it manually:
cpan -i File::Rename
Old standalone version can be found here
man rename
This tool was originally written by Larry Wall, the Perl's dad.

The backslash-paren stuff means, "while matching the pattern, hold on to the stuff that matches in here." Later, on the replacement text side, you can get those remembered fragments back with "\1" (first parenthesized block), "\2" (second block), and so on.

If all you're really doing is removing the second character, regardless of what it is, you can do this:
s/.//2
but your command is building a mv command and piping it to the shell for execution.
This is no more readable than your version:
find -type f | sed -n 'h;s/.//4;x;s/^/mv /;G;s/\n/ /g;p' | sh
The fourth character is removed because find is prepending each filename with "./".

Here's what I would do:
for file in *.[Jj][Pp][Gg] ;do
echo mv -vi \"$file\" `jhead $file|
grep Date|
cut -b 16-|
sed -e 's/:/-/g' -e 's/ /_/g' -e 's/$/.jpg/g'` ;
done
Then if that looks ok, add | sh to the end. So:
for file in *.[Jj][Pp][Gg] ;do
echo mv -vi \"$file\" `jhead $file|
grep Date|
cut -b 16-|
sed -e 's/:/-/g' -e 's/ /_/g' -e 's/$/.jpg/g'` ;
done | sh

for i in *; do mv $i $(echo $i|sed 's/AAA/BBB/'); done

The parentheses capture particular strings for use by the backslashed numbers.

ls F00001-0708-*|sed 's|^F0000\(.*\)|mv & F000\1|' | bash

Some examples that work for me:
$ tree -L 1 -F .
.
├── A.Show.2020.1400MB.txt
└── Some Show S01E01 the Loreming.txt
0 directories, 2 files
## remove "1400MB" (I: ignore case) ...
$ for f in *; do mv 2>/dev/null -v "$f" "`echo $f | sed -r 's/.[0-9]{1,}mb//I'`"; done;
renamed 'A.Show.2020.1400MB.txt' -> 'A.Show.2020.txt'
## change "S01E01 the" to "S01E01 The"
## \U& : change (here: regex-selected) text to uppercase;
## note also: no need here for `\1` in that regex expression
$ for f in *; do mv 2>/dev/null "$f" "`echo $f | sed -r "s/([0-9] [a-z])/\U&/"`"; done
$ tree -L 1 -F .
.
├── A.Show.2020.txt
└── Some Show S01E01 The Loreming.txt
0 directories, 2 files
$
2>/dev/null suppresses extraneous output (warnings ...)
reference [this thread]: https://stackoverflow.com/a/2372808/1904943
change case: https://www.networkworld.com/article/3529409/converting-between-uppercase-and-lowercase-on-the-linux-command-line.html

How do you send the output of ls to mv?

I know you can do it with a find, but is there a way to send the output of ls to mv in the unix command line?

ls is a tool used to DISPLAY some statistics about filenames in a directory.
It is not a tool you should use to enumerate them and pass them to another tool for using it there. Parsing ls is almost always the wrong thing to do, and it is bugged in many ways.
For a detailed document on the badness of parsing ls, which you should really go read, check out: http://mywiki.wooledge.org/ParsingLs
Instead, you should use either globs or find, depending on what exactly you're trying to achieve:
mv * /foo
find . -exec mv {} /foo \;
The main source of badness of parsing ls is that ls dumps all filenames into a single string of output, and there is no way to tell the filenames apart from there. For all you know, the entire ls output could be one single filename!
The secondary source of badness of parsing ls comes from the broken way in which half the world uses bash. They think for magically does what they would like it to do when they do something like:
for file in `ls` # Never do this!
for file in $(ls) # Exactly the same thing.
for is a bash builtin that iterates over arguments. And $(ls) takes the output of ls and cuts it apart into arguments wherever there are spaces, newlines or tabs. Which basically means, you're iterating over words, not over filenames. Even worse, you're asking bask to take each of those mutilated filename words and then treat them as globs that may match filenames in the current directory. So if you have a filename which contains a word which happens to be a glob that matches other filenames in the current directory, that word will disappear and all those matching filenames will appear in its stead!
mv `ls` /foo # Exact same badness as the ''for'' thing.

One way is with backticks:
mv `ls *.boo` subdir
Edit: however, this is fragile and not recommended -- see #lhunath's asnwer for detailed explanations and recommendations.

None of the answers so far are safe for filenames with spaces in them. Try this:
for i in *; do mv "$i" some_dir/; done
You can of course use any glob pattern you like in place of *.

Not exactly sure what you're trying to achieve here, but here's one possibility:
The "xargs" part is the important piece everything else is just setup. The effect of this is to take everything that "ls" outputs and add a ".txt" extension to it.
$ mkdir xxx #
$ cd xxx
$ touch a b c x y z
$ ls
a b c x y z
$ ls | xargs -Ifile mv file file.txt
$ ls
a.txt b.txt c.txt x.txt y.txt z.txt
$
Something like this could also be achieved by:
$ touch a b c x y z
$ for i in `ls`;do mv $i ${i}.txt; done
$ ls
a.txt b.txt c.txt x.txt y.txt z.txt
$
I sort of like the second way better. I can NEVER remember how xargs works without reading the man page or going to my "cute tricks" file.
Hope this helps.

Check out find -exec {} as it might be a better option than ls but it depends on what you're trying to achieve.

/bin/ls | tr '\n' '\0' | xargs -0 -i% mv % /path/to/destdir/
"Useless use of ls", but should work. By specifying the full path to ls(1) you avoid clashes with aliasing of ls(1) mentioned in some of the previous posts. The tr(1) command together with "xargs -0" makes the command work with filenames containing (ugh) whitespace. It won't work with filenames containing newlines, but having filenames like that in the file system is to ask for trouble, so it probably won't be a big problem. But filenames with newlines could exist, so a better solution would be to use "find -print0":
find /path/to/srcdir -type f -print0 | xargs -0 -i% mv % dest/

You shouldn't use the output of ls as the input of another command. Files with spaces in their names are difficult as is the inclusion of ANSI escape sequences if you have:
alias ls-'ls --color=always'
for example.
Always use find or xargs (with -0) or globbing.
Also, you didn't say whether you want to move files or rename them. Each would be handled differently.
edit: added -0 to xargs (thanks for the reminder)

Backticks work well, as others have suggested. See xargs, too. And for really complicated stuff, pipe it into sed, make the list of commands you want, then run it again with the output of sed piped into sh.
Here's an example with find, but it works fine with ls, too:
http://github.com/DonBranson/scripts/blob/f09d24629ab6eb3ce509d4d3078818430306b063/jarfinder.sh

#!/bin/bash
for i in $( ls * );
do
mv $1 /backup/$1
done
else, it's the find solution by sybreon, and as suggested NOT the green mv ls solution.

Just use find or your shells globing!
find . -depth=1 -exec mv {} /tmp/blah/ \;
..or..
mv * /tmp/blah/
You don't have to worry about colour in the ls output, or other piping strangeness - Linux allows basically any characters in the filename except a null byte.. For example:
$ touch "blah\new|
> "
$ ls | xargs file
blahnew|: cannot open `blahnew|' (No such file or directory)
..but find works perfectly:
$ find . -exec file {} \;
./blah\new|
: empty

So this answer doesn't send the output of ls to mv but as #lhunath explained using ls is almost always the wrong tool for the job. Use shell globs or a find command.
For more complicated cases (often in a script), using bash arrays to build up the argument list from shell globs or find commands can be very useful. One can create an array and push to it with the appropriate conditional logic. This also handles spaces in filenames properly.
For example:
myargs=()
# don't push if the glob does not match anything
shopt -s nullglob
myargs+=(myfiles*)
To push files matching a find to the array: https://stackoverflow.com/a/23357277/430128.
The last argument should be the target location:
myargs+=("Some target directory")
Use myargs in the invocation of a command like mv:
mv "${myargs[#]}"
Note the quoting of the array myargs to pass array elements with spaces correctly.

You surround the ls with back quotes and put it after the mv, so like this...
mv `ls` somewhere/
But keep in mind that if any of your file names have spaces in them it won't work very well.
Also it would be simpler to just do something like this: mv filepattern* somewhere/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio