How to programmatically rename XYZUpperCamelCase files to xyzLowerCamelCase? - bash

I have a great number of files that need svn mv'd, and I'd rather not do it all by hand.
Each file name follows the format XYZSomething.txt or SomethingElse.txt, and should be renamed to xyzSomething.txt and somethingElse.txt, respectively.
If I didn't need to deal with the abbreviation prefix, I could just use something like
for f in *; do svn mv $f ${f,}; done
which lowercases just the first character in the filename. As it stands, this gets me 90% of the way, so it wouldn't be too painful to finish it by hand; but I would still like to know how to cover the prefix for future knowledge.
So tell me, great Bash wizards of StackOverflow, what's the best way to rename these files?

So you can do this:
$ echo ThisIsAFileName | sed 's/^\([A-Z]*\)\(.*\)/\L\1\E\2/'
thisIsAFileName
or:
for f in *; do svn mv $f `echo $f | sed 's/^\([A-Z]*\)\(.*\)/\L\1\E\2/'`; done
The \L in sed will make the following letters lowercase up until \E
If it's just the first letter, then you can do:
$ echo ThisIsAFileName | sed 's/.*/\l&/'
thisIsAFileName
\l will just turn the next letter lower case
However, XYZFileName would get renamed to xyzfileName, just because the distinction between XYZ and F is not there...
Hope this helps

That will do the trick with both the two examples :
$ sed -r 's#^([a-Z]+)([A-Z].*)#\L\1\E\2#' <<< XYZSomething
xyzSomething
$ sed -r 's#^([a-Z]+)([A-Z].*)#\L\1\E\2#' <<< SomethingElse
somethingElse

Related

substitute file names using rename

I want to rename files names by substituting all the characters starting from "_ " followed by eight capital letter and keep only the extension.
4585_10_148_H2A119Ub_GTCTGTCA_S51_mcdf_mdup_ngsFlt.fm
4585_10_148_H3K27me3_TCTTCACA_S51_mcdf_mdup_ngsFlt.fm
4585_27_128_Bap1_Bethyl_ACAGATTC_S61_mcdf_mdup_ngsFlt.fw
4585_32_148_1_INPUT_previous_AGAGTCAA_S72_mcdf_mdup_ngsFlt.bw
expected output
4585_10_148_H2A119Ub.fm
4585_10_148_H3K27me3.fm
4585_27_128_Bap1_Bethyl.fm
4585_32_148_1_INPUT_previous.fm
Try this:
for f in *; do
target=$(echo "${f}" | sed -E 's/_[[:upper:]]{8}.*\././')
mv "${f}" "${target}"
done
The key thing is the -E argument to sed, since it enables expanded regular expressions.
You can also use rename (a.k.a. prename or Perl rename) like this:
rename --dry-run 's|_[[:upper:]]{8}.*\.|.|' *
Sample Output
'4585_10_148_H2A119Ub_GTCTGTCA_S51_mcdf_mdup_ngsFlt.fm' would be renamed to '4585_10_148_H2A119Ub.fm'
'4585_32_148_1_INPUT_previous_AGAGTCAA_S72_mcdf_mdup_ngsFlt.bw' would be renamed to '4585_32_148_1_INPUT_previous.bw'
Remove the --dry-run and run again for real, if the output looks good.
This has several added benefits:
that it will warn and avoid any conflicts if two files rename to the same thing,
that it can rename across directories, creating any necessary intermediate directories on the way,
that you can do a dry run first to test it,
that you can use arbitrarily complex Perl code to specify the new name.
On a Mac, install it with homebrew using:
brew install rename
You may try this.
for i in *.fm; do mv $i $(echo $i | sed 's/_GTCTGTCA_S51_mcdf_mdup_ngsFlt//g'); done;
for i in *.fm; do mv $i $(echo $i | sed 's/_TCTTCACA_S51_mcdf_mdup_ngsFlt//g'); done;

rename long file names while keeping a part of the name

I have a lot of files which have a certain pattern:
some123_name4.with5.number01-02_and6-other7.stuff.txt
some123_name4.with5.number05-06_and6-other7.stuff.txt
some123_name4.with5.number11-12_and6-other7.stuff.txt
and I would like to rename them keeping the part in the middle number??-??. For example like:
different45_start.keep76.number01-02_but.change34_rest.txt
different45_start.keep76.number05-06_but.change34_rest.txt
different45_start.keep76.number11-12_but.change34_rest.txt
I have played around with expr, %% and ? but I didn't even manage to extract the number??-?? part of the filename.
This ought to do it (replace with actual patterns)
#!/bin/bash
for f in some123* ; do
mv $f `echo $f | sed -e 's/some123_name4.with5/different45_start.keep76/' -e 's/and6-other7.stuff/but.change34_rest/'`
done
May I suggest you use regexp'es to extract your numbers from the old name into the new name? Then it's just a question about
creating a new subdirectory (just in case you make a mistake)
using "ls" to list the file names (with options for 1 (one) name per line, not following down into subdirs)
iterating over the file names
In each iteration,
set the new name
run the copy commande "cp" using the old and the new names (but as a trick, copy down into your new subdirectory)
All in all, something like this:
mkdir NEW
ls -1d some* \
| while read FILE; do
NEWFILE=`echo "$FILE" \
| sed 's|^some12\\([0-9]\\)_name\\([0-9]\\)[.]with\\([0-9]\\)[.]number\\([0-9][0-9]-[0-9][0-9]\\)_and\\([0-9]\\)-other\\([0-9]\\)[.]stuff[.]txt$|different\\2\\3_start.keep\\6\\5.number\\4_but.change\\1\\2_rest.txt|'`
cp "$FILE" NEW/"$NEWFILE"
done
As you can see, due to the backticks (`) you have to use extra backslashes in the regexp.
Does this help you, as a start?
a possible solution using expr looks like the following:
for f in *number??-??*; do
fixedPart=$(expr "$f" : '.*\(number[0-9][0-9]-[0-9][0-9]\).*')
newName="different45_start.keep76.${fixedPart}_but.change34_rest.txt"
mv "$f" "$newName"
done

How to extract a string at end of line after a specific word

I have different location, but they all have a pattern:
some_text/some_text/some_text/log/some_text.text
All locations don't start with the same thing, and they don't have the same number of subdirectories, but I am interested in what comes after log/ only. I would like to extract the .text
edited question:
I have a lot of location:
/s/h/r/t/log/b.p
/t/j/u/f/e/log/k.h
/f/j/a/w/g/h/log/m.l
Just to show you that I don't know what they are, the user enters these location, so I have no idea what the user enters. The only I know is that it always contains log/ followed by the name of the file.
I would like to extract the type of the file, whatever string comes after the dot
THe only i know is that it always contains log/ followed by the name
of the file.
I would like to extract the type of the file, whatever string comes
after the dot
based on this requirement, this line works:
grep -o '[^.]*$' file
for your example, it outputs:
text
You can use bash built-in string operations. The example below will extract everything after the last dot from the input string.
$ var="some_text/some_text/some_text/log/some_text.text"
$ echo "${var##*.}"
text
Alternatively, use sed:
$ sed 's/.*\.//' <<< "$var"
text
Not the cleanest way, but this will work
sed -e "s/.*log\///" | sed -e "s/\..*//"
This is the sed patterns for it anyway, not sure if you have that string in a variable, or if you're reading from a file etc.
You could also grab that text and store in a sed register for later substitution etc. All depends on exactly what you are trying to do.
Using awk
awk -F'.' '{print $NF}' file
Using sed
sed 's/.*\.//' file
Running from the root of this structure:
/s/h/r/t/log/b.p
/t/j/u/f/e/log/k.h
/f/j/a/w/g/h/log/m.l
This seems to work, you can skip the echo command if you really just want the file types with no record of where they came from.
$ for DIR in *; do
> echo -n "$DIR "
> find $DIR -path "*/log/*" -exec basename {} \; | sed 's/.*\.//'
> done
f l
s p
t h

Can I convert between UpperCamelCase, lowerCamelCase and dash-case in filenemes with Bash? [duplicate]

This question already has answers here:
linux bash, camel case string to separate by dash
(9 answers)
Closed 6 years ago.
I am in the process of merging efforts with another developer. I am using UpperCamelCasing, but we decided to follow Google's HTML style guide in using lower case and separating words with hyphens. This decision requires me to rename quite some files on my filesystem. I first though this to be easy since I often use bash for renaming large collections of files. Unfortunately renaming on the Casing style appeared to be a bit more complicating and I did not manage to find an approach.
Can I convert files from one naming convention to another with Bash?
Try using rename command with -f option to rename files with desired substitutions.
rename -f 's/([a-z])([A-Z])/$1-$2/g; y/A-Z/a-z/' <list_of_files>
If you also want to extract <list_of_files> with some pattern, let's say extension .ext, you need to combine find with above command using xargs
find -type f -name "*.ext" -print0 | xargs -0 rename -f 's/([a-z])([A-Z])/$1-$2/g; y/A-Z/a-z/'
For example if you want to rename all files in pwd
$ ls
dash-case
lowerCamelCase
UpperCamelCase
$ rename -f 's/([a-z])([A-Z])/$1-$2/g; y/A-Z/a-z/' *
$ ls
dash-case
lower-camel-case
upper-camel-case
Try this:
for FILE in *; do NEWFILE=$((sed -re 's/\B([A-Z])/-\1/g' | tr [:upper:] [:lower:]) <<< "$FILE"); if [ "$NEWFILE" != "$FILE" ]; then echo mv \""$FILE"\" \""$NEWFILE"\"; fi; done
This should give you a list of "mv" statements on standard output. Double-check that they look right, then just add | bash to the end of the statement to run them all.
How does it work?
for FILE in *; do
NEWFILE=$((sed -re 's/\B([A-Z])/-\1/g' | tr [:upper:] [:lower:]) <<< "$FILE")
if [ "$NEWFILE" != "$FILE" ]; then
echo mv \""$FILE"\" \""$NEWFILE"\"
fi
done
The for FILE in * loops across all files in the current directory, acknowledging that there are a wide variety of ways to loop through all files. The sed statement matches only uppercase letters that, according to \B, aren't on a word boundary (i.e. at the beginning of the string). Because of this selective match, it makes the most sense to switch everything to lowercase in a separate call to tr. Finally, the condition ensures that you only see the filenames that change, and the trick of using echo ensures that you don't make changes to your filesystem without seeing them first.
I ran into a similar question and based on one answer there I came to the following solution. It is not a full Bash solution, since it relies on perl, but since it does the trick I am sharing it.
ls |for file in `xargs`; do mv $file `echo $file | perl -ne 'print lc(join("-", split(/(?=[A-Z])/)))'`; done

Using sed to mass rename files

Objective
Change these filenames:
F00001-0708-RG-biasliuyda
F00001-0708-CS-akgdlaul
F00001-0708-VF-hioulgigl
to these filenames:
F0001-0708-RG-biasliuyda
F0001-0708-CS-akgdlaul
F0001-0708-VF-hioulgigl
Shell Code
To test:
ls F00001-0708-*|sed 's/\(.\).\(.*\)/mv & \1\2/'
To perform:
ls F00001-0708-*|sed 's/\(.\).\(.*\)/mv & \1\2/' | sh
My Question
I don't understand the sed code. I understand what the substitution
command
$ sed 's/something/mv'
means. And I understand regular expressions somewhat. But I don't
understand what's happening here:
\(.\).\(.*\)
or here:
& \1\2/
The former, to me, just looks like it means: "a single character,
followed by a single character, followed by any length sequence of a
single character"--but surely there's more to it than that. As far as
the latter part:
& \1\2/
I have no idea.
First, I should say that the easiest way to do this is to use the
prename or rename commands.
On Ubuntu, OSX (Homebrew package rename, MacPorts package p5-file-rename), or other systems with perl rename (prename):
rename s/0000/000/ F0000*
or on systems with rename from util-linux-ng, such as RHEL:
rename 0000 000 F0000*
That's a lot more understandable than the equivalent sed command.
But as for understanding the sed command, the sed manpage is helpful. If
you run man sed and search for & (using the / command to search),
you'll find it's a special character in s/foo/bar/ replacements.
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success‐
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
Therefore, \(.\) matches the first character, which can be referenced by \1.
Then . matches the next character, which is always 0.
Then \(.*\) matches the rest of the filename, which can be referenced by \2.
The replacement string puts it all together using & (the original
filename) and \1\2 which is every part of the filename except the 2nd
character, which was a 0.
This is a pretty cryptic way to do this, IMHO. If for
some reason the rename command was not available and you wanted to use
sed to do the rename (or perhaps you were doing something too complex
for rename?), being more explicit in your regex would make it much
more readable. Perhaps something like:
ls F00001-0708-*|sed 's/F0000\(.*\)/mv & F000\1/' | sh
Being able to see what's actually changing in the
s/search/replacement/ makes it much more readable. Also it won't keep
sucking characters out of your filename if you accidentally run it
twice or something.
you've had your sed explanation, now you can use just the shell, no need external commands
for file in F0000*
do
echo mv "$file" "${file/#F0000/F000}"
# ${file/#F0000/F000} means replace the pattern that starts at beginning of string
done
I wrote a small post with examples on batch renaming using sed couple of years ago:
http://www.guyrutenberg.com/2009/01/12/batch-renaming-using-sed/
For example:
for i in *; do
mv "$i" "`echo $i | sed "s/regex/replace_text/"`";
done
If the regex contains groups (e.g. \(subregex\) then you can use them in the replacement text as \1\,\2 etc.
The easiest way would be:
for i in F00001*; do mv "$i" "${i/F00001/F0001}"; done
or, portably,
for i in F00001*; do mv "$i" "F0001${i#F00001}"; done
This replaces the F00001 prefix in the filenames with F0001.
credits to mahesh here: http://www.debian-administration.org/articles/150
The sed command
s/\(.\).\(.*\)/mv & \1\2/
means to replace:
\(.\).\(.*\)
with:
mv & \1\2
just like a regular sed command. However, the parentheses, & and \n markers change it a little.
The search string matches (and remembers as pattern 1) the single character at the start, followed by a single character, follwed by the rest of the string (remembered as pattern 2).
In the replacement string, you can refer to these matched patterns to use them as part of the replacement. You can also refer to the whole matched portion as &.
So what that sed command is doing is creating a mv command based on the original file (for the source) and character 1 and 3 onwards, effectively removing character 2 (for the destination). It will give you a series of lines along the following format:
mv F00001-0708-RG-biasliuyda F0001-0708-RG-biasliuyda
mv abcdef acdef
and so on.
Using perl rename (a must have in the toolbox):
rename -n 's/0000/000/' F0000*
Remove -n switch when the output looks good to rename for real.
There are other tools with the same name which may or may not be able to do this, so be careful.
The rename command that is part of the util-linux package, won't.
If you run the following command (GNU)
$ rename
and you see perlexpr, then this seems to be the right tool.
If not, to make it the default (usually already the case) on Debian and derivative like Ubuntu :
$ sudo apt install rename
$ sudo update-alternatives --set rename /usr/bin/file-rename
For archlinux:
pacman -S perl-rename
For RedHat-family distros:
yum install prename
The 'prename' package is in the EPEL repository.
For Gentoo:
emerge dev-perl/rename
For *BSD:
pkg install gprename
or p5-File-Rename
For Mac users:
brew install rename
If you don't have this command with another distro, search your package manager to install it or do it manually:
cpan -i File::Rename
Old standalone version can be found here
man rename
This tool was originally written by Larry Wall, the Perl's dad.
The backslash-paren stuff means, "while matching the pattern, hold on to the stuff that matches in here." Later, on the replacement text side, you can get those remembered fragments back with "\1" (first parenthesized block), "\2" (second block), and so on.
If all you're really doing is removing the second character, regardless of what it is, you can do this:
s/.//2
but your command is building a mv command and piping it to the shell for execution.
This is no more readable than your version:
find -type f | sed -n 'h;s/.//4;x;s/^/mv /;G;s/\n/ /g;p' | sh
The fourth character is removed because find is prepending each filename with "./".
Here's what I would do:
for file in *.[Jj][Pp][Gg] ;do
echo mv -vi \"$file\" `jhead $file|
grep Date|
cut -b 16-|
sed -e 's/:/-/g' -e 's/ /_/g' -e 's/$/.jpg/g'` ;
done
Then if that looks ok, add | sh to the end. So:
for file in *.[Jj][Pp][Gg] ;do
echo mv -vi \"$file\" `jhead $file|
grep Date|
cut -b 16-|
sed -e 's/:/-/g' -e 's/ /_/g' -e 's/$/.jpg/g'` ;
done | sh
for i in *; do mv $i $(echo $i|sed 's/AAA/BBB/'); done
The parentheses capture particular strings for use by the backslashed numbers.
ls F00001-0708-*|sed 's|^F0000\(.*\)|mv & F000\1|' | bash
Some examples that work for me:
$ tree -L 1 -F .
.
├── A.Show.2020.1400MB.txt
└── Some Show S01E01 the Loreming.txt
0 directories, 2 files
## remove "1400MB" (I: ignore case) ...
$ for f in *; do mv 2>/dev/null -v "$f" "`echo $f | sed -r 's/.[0-9]{1,}mb//I'`"; done;
renamed 'A.Show.2020.1400MB.txt' -> 'A.Show.2020.txt'
## change "S01E01 the" to "S01E01 The"
## \U& : change (here: regex-selected) text to uppercase;
## note also: no need here for `\1` in that regex expression
$ for f in *; do mv 2>/dev/null "$f" "`echo $f | sed -r "s/([0-9] [a-z])/\U&/"`"; done
$ tree -L 1 -F .
.
├── A.Show.2020.txt
└── Some Show S01E01 The Loreming.txt
0 directories, 2 files
$
2>/dev/null suppresses extraneous output (warnings ...)
reference [this thread]: https://stackoverflow.com/a/2372808/1904943
change case: https://www.networkworld.com/article/3529409/converting-between-uppercase-and-lowercase-on-the-linux-command-line.html

Resources