Rename batch of specific files using bash - macos

Say I have a folder which contains files like this: a constant prefix and then an underscore and some description which is different for every file:
constantnamehere_description1.doc
constantnamehere_description2.doc
.
.
etc
Here description1, description2 etc just symbolized the different descriptions and not the actual number 1,2 etc..
How can I rename these files to just this?
constantnamehere1.doc
constantnamehere2.doc
.
.
etc
Here the numbers 1,2,..,etc symbolize the actual sequential ending that i want my files to have after the renaming.
The sequential ending (1,2,3,...,end) is very important.
Till now I have tried:
for i in *.doc; do mv "$i" "{i/_*.doc/ .doc}"; done
example actual file names
1003407_cc_1.vtk
1003407_cc_2.vtk
1003407_cc_3.vtk
1003407_cv.left.right.vtk
1003407_thalamo_frontal.left.vtk
I want to be like:
1003407_1.vtk
1003407_2.vtk
1003407_3.vtk
1003407_4.vtk
1003407_5.vtk
To make it extremely clear: I want everything to be removed after the first underscore and to be replaced with sequential numbers keeping the ".vtk" extension of the file

Using the an answer to Capturing Groups From a Grep RegEx, we can generate a regex for these file names and then rename by using the captured groups:
$ regex="([^_]*)_[^0-9]*([0-9]*).([a-z]*)"
$ for f in *doc
do
[[ $f =~ $regex ]]
echo "mv $f --> ${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
done
The regex says: get everything up to _, then expect some characters until a digit is found. Catch that set of digits and then expect a dot followed by the extension.

Use rename:
i=1
for file in *_*.vtk
do
rename "s/_[^.]*/${i}/" "$file"
i=$(( i + 1 ))
done
This removes everything between the underscore and the first . from all files matching the *_*.vtk pattern. If your filenames contain more than one ., the pattern needs to be adapted.
EDIT: Solution modified according to modified question.

I solved it like this:
i=0;
for file in *.vtk; do mv "${file}" 100307_"${i}".vtk; i=$((i+1)); done

Related

In bash, how can I remove multiple versions of the same file?

This may be a very specific case, but I know very little about bash and I need to remove "duplicate" files. I've been downloading totally legal videogame roms these past few days, and I noticed that a lot of packs have many different versions of the same game, like this:
Awesome Golf (1991).lnx
Awesome Golf (1991) [b1].lnx
Baseball Heroes (1991).lnx
Baseball Heroes (1991) [b1].lnx
Basketbrawl (1992).lnx
Basketbrawl (1992) [a1].lnx
Basketbrawl (1992) [b1].lnx
Batman Returns (1992).lnx
Batman Returns (1992) [b1].lnx
How can I make a bash script that removes the duplicates? A duplicate would be any file that has the same name, and the name would be the string before the first parenthesis. The script should parse all the files and grab their names, see which names match to detect duplicates, and remove all files except the first one (first being the first that comes up in alphabetical order).
Would you please try the following:
#!/bin/bash
dir="dir" # the directory where the rom files are located
declare -A seen # associative array to detect the duplicates
while IFS= read -r -d "" f; do # loop over filenames by assigning "f" to it
name=${f%(*} # extract the "name" by removing left paren and following characters
name=${name%.*} # remove the extension considering the case the filename doesn't have parens
name=${name%[*} # remove the left square bracket and following characters considering the case as above
name=${name%% } # remove trailing whitespaces, if any
if (( seen[$name]++ )); then # if the name duplicates...
# remove "echo" if the output looks good
echo rm -- "$f" # then remove the file
fi
done < <(find "$dir" -type f -name "*.lnx" -print0 | sort -z -t "." -k1,1)
# sort the list of filenames in alphabetical order
Please modify the first dir= line to your directory path which holds the rom files.
The echo command just prints the filenames to be removed as a rehearsal. If the output looks good, then remove echo and execute the real one.
[Explanation]
An associative array seen associates the extracted "name" with a
counter of appearance. If the counter is non-zero, the file is a duplicated
one and can be removed (as long as the files are properly sorted).
The -print0 option to find, the -z option to sort and the -d ""
option to read make a null character as a delimiter of filenames to
accept filenames which contain special characters such as a whitespace,
tab, newline, etc.

How do I rename multiple files before the extension in linux?

I want to take a group of files with names like 123456_1_2.mpg and turn it into 123456.mpg how can I do this using terminal commands?
To loop over all the available files you can use a for loop over the file names of the form ??????_?_?.mpg.
To rename the files you can retain the shortest match of a pattern from the beginning of the string using ${MYVAR%%pattern} without using any external command.
This said, your code should look like:
#!/bin/bash
shopt -s nullglob # do nothing if no matches found
for file in ??????_?_?.mpg; do
[[ -f $file ]] || continue # skip if not a regular file
new_file="${file%%_*}.mpg" # compose the new file name
echo mv "$file" "$new_file" # remove echo after testing
done
rename 's/_.*/.mpg/' *mpg
this will remove everything between the first underscore and the mpg file extension for all files ending in mpg
We can use grep to strip out everything but the first sequence of numbers. The --interactive flag will ask you if you're sure for each move, so you can make sure it's not doing anything you don't expect.
for file in *.mpg; do
mv --interactive "$file" "$(grep -o '^[0-9]\+' <<< "$file")".mpg
done
The regex ^[0-9]\+ translates to "any sequence of characters that starts with a number and is followed by zero or more numbers".

Remove suffix as well as prefix from path in bash

I have filepaths of the form:
../healthy_data/F35_HC_532d.dat
I want to extract F35_HC_532d from this. I can remove prefix and suffix from this filename in bash as:
for i in ../healthy_data/*; do echo ${i#../healthy_data/}; done # REMOVES PREFIX
for i in ../healthy_data/*; do echo ${i%.dat}; done # REMOVES SUFFIX
How can I combine these so that in a single command I would be able to remove both and extract only the part that I want?
You can use BASH regex for this like this and print captured group #1:
for file in ../healthy_data/*; do
[[ $file =~ .*/([_[:alnum:]]+)\.dat$ ]] && echo "${BASH_REMATCH[1]}"
done
If you can use Awk, it is pretty simple,
for i in ../healthy_data/*
do
stringNeeded=$(awk -F/ '{split($NF,temp,"."); print temp[1]}' <<<"$i")
printf "%s\n" "$stringNeeded"
done
The -F/ splits the input string on / character, and $NF represents the last field in the string in that case, F35_HC_532d.dat, now the split() function is called with the de-limiter . to extract the part before the dot.
The options/functions in the above Awk are POSIX compatible.
Also bash does not support nested parameter expansions, you need to modify in two fold steps something like below:-
tempString="${i#*/*/}"
echo "${tempString%.dat}"
In a single-loop,
for i in ../healthy_data/*; do tempString="${i#*/*/}"; echo "${tempString%.dat}" ; done
The two fold syntax here, "${i#*/*/}" part just stores the F35_HC_532d.dat into the variable tempString and in that variable we are removing the .dat part as "${tempString%.dat}"
If all files end with .dat (as you confirmed) you can use the basename command:
basename -s .dat /path/to/files/*
If there are many(!) of those files, use find to avoid an argument list too long error:
find /path/to/files -maxdepth 1 -name '*.dat' -exec basename -s .dat {} +
For a shell script which needs to deal if any number of .dat files use the second command!
Do you count this as one step?
for i in ../healthy_data/*; do
sed 's#\.[^.]*##'<<< "${i##*/}"
done
You can't strip both a prefix and suffix in a single parameter expansion.
However, this can be accomplished in a single loop using parameter expansion operations only. Just save the prefix stripped expansion to a variable and use expansion again to remove its suffix:
for file in ../healthy_data/*; do
prefix_stripped="${file##*\/healthy_data\/}"
echo "${prefix_stripped%.dat}"
done
If you are on zsh, one way to achieve this without the need for defining another variable is
for i in ../healthy_data/*; do echo "${${i#../healthy_data/}%.dat}"; done
This removes prefix and suffix in one step.
In your specific example the prefix stems from the fact that the files are located in a different directory. You can get rid of the prefix by cding in this case.
(cd ../healthy_data ; for i in *; do echo ${i%.dat}; done)
The (parens) invoke a sub shell process and your current shell stays where it is. If you don't want a sub shell you can cd back easily:
cd ../healthy_data ; for i in *; do echo ${i%.dat}; done; cd -

automatically renaming files

I have a bunch of files (more than 1000) on this like the followings
$ ls
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-dev.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-dev.lex
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-train.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-train.lex
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lex
org.allenai.ari.solvers.termselector.ExpandedLearner.lc
org.allenai.ari.solvers.termselector.ExpandedLearner.lex
org.allenai.ari.solvers.termselector.ExpandedLearnerSVM.lc
org.allenai.ari.solvers.termselector.ExpandedLearnerSVM.lex
....
I have to rename these files files by adding a learners right before the capitalized name. For example
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lex
would change to
org.allenai.ari.solvers.termselector.learners.BaselineLearnersurfaceForm.lex
and this one
org.allenai.ari.solvers.termselector.ExpandedLearner.lc
would change to
org.allenai.ari.solvers.termselector.learners.ExpandedLearner.lc
Any ideas how to do this automatically?
for f in org.*; do
echo mv "$f" "$( sed 's/\.\([A-Z]\)/.learner.\1/' <<< "$f" )"
done
This short loop outputs an mv command that renames the files in the manner that you wanted. Run it as-is first, and when you are certain it's doing what you want, remove the echo and run again.
The sed bit in the middle takes a filename ($f, via a here-string, so this requires bash) and replaces the first occurrence of a capital letter after a dot with .learner. followed by that same capital letter.
There is a tool called perl-rename, sometimes rename. Not to be confused with rename from util-linux.
It's very good for tasks like this as it takes a perl expression and renames accordingly:
perl-rename 's/(?=\.[A-Z])/.learners/' *
You can play with the regex online
Alternative you can a for loop and $BASH_REMATCH:
for file in *; do
[ -e "$file" ] || continue
[[ "$file" =~ ^([^A-Z]*)(.*)$ ]]
mv -- "$file" "${BASH_REMATCH[1]}learners.${BASH_REMATCH[2]}"
done
A very simple approach (useful if you only need to do this one time) is to ls >dummy them into a text file dummy, and then use find/replace in a text editor to make lines of the form mv xxx.yyy xxx.learners.yyy. Then you can simple execute the resulting file with ./dummy.
The exact find/replace commands depend on the text editor you use, but something like
replace org. with mv org.. That gets you the mv in the beginning.
replace mv org.allenai.ari.solvers.termselector.$1 with mv org.allenai.ari.solvers.termselector.$1 org.allenai.ari.solvers.termselector.learner.$1 to duplicate the filename and insert the learner.
There is also syntax with a for, which can do it probably in one line, (long) but I cannot explain it - try help for if you want to learn about it.

Keep only one version of each file (bash)

I want to remove redundant files in a folder. Something like
cat_1.jpg
cat_2.jpg
cat_3.jpg
dog_10.jpg
dog_100.jpg
reduced to
cat_3.jpg
dog_100.jpg
That is, take only the version of each file with the highest number suffix and delete the rest.
This is very much like
list the files with minimum sequence
but the bash answer there has a "for ... in ... ". I have thousands of file names.
EDIT:
Got the file name convention wrong. There may be other underscores (ex. cat_and_dog_100.jpg). I need it to only take the number after the last underscore.
Assuming your filenames are always in the form <name>_<numbers>.jpg, here's a quick hack:
while read filename; do
prefix=${filename/%_*/} # Get text before underscore
if [ "$prev_prefix" != "$prefix" ]; then # we see a new prefix
echo "Keeping filename"
prev_prefix=$prefix
else # same prefix
echo "Deleting $filename"
rm $filename
fi
done < <(find . -maxdepth 1 -name "*.jpg"| sort -n -t'_' -k1,2)
How this works:
Sorts all *.jpg files first by <name> and then by <numbers>.
all files with the same prefix will be grouped together with the highest <number> appearing first
Iterates through the list of filenames and delete files except when a new <name> is found (which should be the one with the highest <number> )
Note that find is used instead of ls *.jpg so we can better handle large number of files.
Disclaimer: This is a rather fragile way of dealing with files and versioning, and should not be adopted as a long term solution. Do heed the comments posted on the question.

Resources