Substitute shortest match of pattern in filename - bash

I have files with the following filename pattern:
C14_1_S1_R1_001_copy1.fastq.gz
That I would like to be renamed this way:
C14_1_S1_R1.fastq.gz
I have tested unsuccessfully the following pattern replacement strategy:
for f in *.fastq.gz; do echo mv "$f" "${f/_*./_}"; done
Any suggestion is welcome.

Your original filename has several underscore characters but you only want to remove from the second to last underscore. In that case, try:
mv "$f" "${f%_*_*}.fastq.gz"
Consider a directory with these files:
$ ls -1
C14_1_S1_R1_001_copy1.fastq.gz
C15_1_S1_R1_001_copy1.fastq.gz
If we run our loop and then run a new ls, we see the changed filenames:
$ for f in ./*.fastq.gz; do mv "$f" "${f%_*_*}.fastq.gz"; done
$ ls -1
C14_1_S1_R1.fastq.gz
C15_1_S1_R1.fastq.gz
The key here is that ${var%word} is suffix removal and it matches the shortest possible suffix that matches the glob word. Thus, ${f%_*_*} removes the second-to-last underscore character and everything after it. ${f%_*_*}.fastq.gz removes the second-to-last underscore character and everything after and then restores your desired suffix of .fastq.gz.

str="C14_1_S1_R1_001_copy1.fastq.gz"
front=$(echo "${str}" | cut -d'_' -f1-4)
back=$(echo "${str}" | cut --complement -d'.' -f1)
echo "${front}.${back}"

With regex using the =~ test operator and BASH_REMATCH
#!/usr/bin/env bash
for file in *.fastq.gz; do
if [[ $file =~ ^(.+)(_[[:digit:]]+_copy.*[^\.])(\.fastq\.gz)$ ]]; then
echo mv -v "$file" "${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
fi
done
Basically it just split the C14_1_S1_R1_001_copy1.fastq.gz into three parts.
BASH_REMATCH[1] has C14_1_S1_R1
BASH_REMATCH[2] has _001_copy1
BASH_REMATCH[3] has .fastq.gz
Remove the echo if you're ok with the output so the files can be renamed.

Related

Remove part of name of multiple files on mac os

i have a directory full of .png files with a random caracters in the middle of the filenames like
T1_021_É}ÉcÉjÉV_solid box.png
T1_091_ÉRÉjÉtÉ#Å[_City.png
T1_086_ÉnÉiÉ~ÉYÉL_holiday.png
I expect this after removing
T1_021_solid box.png
T1_091_City.png
T1_086_holiday.png
Thank you
Using for to collect the file lists and bash parameter expansion with substring removal, you can do the following in the directory containing the files:
for i in T1_*; do
beg="${i%_*_*}" ## trim from back to 2nd '_'
end="${i##*_}" ## trim from from through last '_'
mv "$i" "${beg}_$end" ## mv file to new name.
done
(note: you don't have to use variables beg and end you can just combing both parameter expansions to form the new filenaame, e.g. mv "$i" "${i%_*_*}_${i##*_}", up to you, but beg and end make things a bit more readable.)
Result
New file names:
$ ls -al T1_*
T1_021_solid
T1_086_holiday.png
T1_091_City.png
Just another way to approach it from bash only.
Using cut
You can use cut to remove the 3rd field using '_' as the delimiter with :
for i in T1_*; do
mv "$i" $(cut -d'_' -f-2,4- <<< "$i")
done
(same output)
The only drawback is the use of cut in the command substitution would require an additional subshell be spawned each iteration.
If the set of random characters have _ before and after
find . -type f -iname "T1_0*" 2>/dev/null | while read file; do
mv "${file}" "$(echo ${file} | cut -d'_' -f1,2,4-)"
done
Explanation:
Find all files that start with T1_
Read the list line by line using the while loop
Use _ as delimiter and cut the 3rd column
Use mv to rename
Filenames after renaming:
T1_021_solid box.png
T1_086_holiday.png
T1_091_City.png

Extract date from filename using bash script

I know that similar things have been asked before, but I haven't been able to really make hand and foot out of what's been posted.
I've got a whole bunch of files that contain the date in the format YYYYMMDD at some point in the filename. Luckily this is the only 8 digit substring in all the filenames!
I will need to write the dates into another file later, but that should be fine. I'm struggling to extract the date into a variable first...
I know I can get it with grep:
for d in $( ls *.csv | grep -Po "\d{8}"; do
echo $d done
However, as I want to get the full filename into a variable too while I iterate through them, that's not an option right now.
I've tried using sed, but I don't think I know how to use it:
for f in $( ls *.csv ); do
d=$( $f | sed -e 's/^.*\(\d{8}\).*$')
echo $d
done
Thanks for pointing me in the right direction!
Loop through your csv files like this (don't parse ls):
for f in *.csv; do
echo "$f"
d=$(echo "$f" | grep -oE '[0-9]{8}')
done
I've used grep in extended mode (-E) but perl mode is equally valid.
As you have tagged with bash, you can do d=$(grep -oE '[0-9]{8}' <<<"$f" instead if you prefer. You can also use built-in regular expression support, which is slightly more verbose but saves calling an external tool:
re='[0-9]{8}'
[[ $f =~ $re ]] && d="${BASH_REMATCH[0]}"
The array BASH_REMATCH contains the matches to the regular expression. If there is a match, we assign it to d.
#!/bin/bash
# ^-- important: bash, not not /bin/sh
for f in *.csv; do # Don't use ls for iterating over filenames
[[ $f =~ [[:digit:]]{8} ]] && { # native built-in regex matching
number=${BASH_REMATCH[0]} # ...refer to the matched content...
echo "Found $number in filename $f" # ...and emit output.
}
done

How to split path by last slash?

I have a file (say called list.txt) that contains relative paths to files, one path per line, i.e. something like this:
foo/bar/file1
foo/bar/baz/file2
goo/file3
I need to write a bash script that processes one path at a time, splits it at the last slash and then launches another process feeding it the two pieces of the path as arguments. So far I have only the looping part:
for p in `cat list.txt`
do
# split $p like "foo/bar/file1" into "foo/bar/" as part1 and "file1" as part2
inner_process.sh $part1 $part2
done
How do I split? Will this work in the degenerate case where the path has no slashes?
Use basename and dirname, that's all you need.
part1=$(dirname "$p")
part2=$(basename "$p")
A proper 100% bash way and which is safe regarding filenames that have spaces or funny symbols (provided inner_process.sh handles them correctly, but that's another story):
while read -r p; do
[[ "$p" == */* ]] || p="./$p"
inner_process.sh "${p%/*}" "${p##*/}"
done < list.txt
and it doesn't fork dirname and basename (in subshells) for each file.
The line [[ "$p" == */* ]] || p="./$p" is here just in case $p doesn't contain any slash, then it prepends ./ to it.
See the Shell Parameter Expansion section in the Bash Reference Manual for more info on the % and ## symbols.
I found a great solution from this source.
p=/foo/bar/file1
path=$( echo ${p%/*} )
file=$( echo ${p##*/} )
This also works with spaces in the path!
While basename and dirnames are really helpful, maybe you are in the same situation as me:
I need to get only the first Nth folders of a path, and I can be on any folder, like these ones: /home/me/folder/i/want/, /home/me/, or /home/me/folder/i/want/folder/i/dont/want/.
So I used cut.
Here's the command to get only /home/me/folder/i/want, no matter where I am:
echo "/home/me/folder/i/want/folder/i/dont/want" | cut -f 1,2,3,4,5,6 -d "/"
Here, cut is splitting the string by "/" chars, and is displaying 1st, 2nd [...] 6th words only.
Here are some examples:
$ echo $PWD
/home/me/folder/i/want/folder/i/dont/want
$ echo $PWD | cut -f 1,2,3,4,5,6 -d "/"
/home/me/folder/i/want
$ cd ../../../..
$ echo $PWD
/home/me/folder/i/want
$ echo $PWD | cut -f 1,2,3,4,5,6 -d "/"
/home/me/folder/i/want
$ cd ~
echo $PWD | cut -f 1,2,3,4,5,6 -d "/"
/home/me
Here is one example to find and replace file extensions to xml.
for files in $(ls); do
filelist=$(echo $files |cut -f 1 -d ".");
mv $files $filelist.xml;
done

Remove hyphens from filename with Bash

I am trying to create a small Bash script to remove hyphens from a filename. For example, I want to rename:
CropDamageVO-041412.mpg
to
CropDamageVO041412.mpg
I'm new to Bash, so be gentle :] Thank you for any help
Try this:
for file in $(find dirWithDashedFiles -type f -iname '*-*'); do
mv $file ${file//-/}
done
That's assuming that your directories don't have dashes in the name. That would break this.
The ${varname//regex/replacementText} syntax is explained here. Just search for substring replacement.
Also, this would break if your directories or filenames have spaces in them. If you have spaces in your filenames, you should use this:
for file in *-*; do
mv $file "${file//-/}"
done
This has the disadvantage of having to be run in every directory that contains files you want to change, but, like I said, it's a little more robust.
FN=CropDamageVO-041412.mpg
mv $FN `echo $FN | sed -e 's/-//g'`
The backticks (``) tell bash to run the command inside them and use the output of that command in the expression. The sed part applies a regular expression to remove the hyphens from the filename.
Or to do this to all files in the current directory matching a certain pattern:
for i in *VO-*.mpg
do
mv $i `echo $i | sed -e 's/-//g'`
done
A general solution for removing hyphens from any string:
$ echo "remove-all-hyphens" | tr -d '-'
removeallhyphens
$
f=CropDamageVO-041412.mpg
echo "${f//-}"
or, of course,
mv "$f" "${f//-}"

Recursive BASH renaming

EDIT: Ok, I'm sorry, I should have specified that I was on Windows, and using win-bash, which is based on bash 1.14.2, along with the gnuwin32 tools. This means all of the solutions posted unfortunately didn't help out. It doesn't contain many of the advanced features. I have however figured it out finally. It's an ugly script, but it works.
#/bin/bash
function readdir
{
cd "$1"
for infile in *
do
if [ -d "$infile" ]; then
readdir "$infile"
else
renamer "$infile"
fi
done
cd ..
}
function renamer
{
#replace " - " with a single underscore.
NEWFILE1=`echo "$1" | sed 's/\s-\s/_/g'`
#replace spaces with underscores
NEWFILE2=`echo "$NEWFILE1" | sed 's/\s/_/g'`
#replace "-" dashes with underscores.
NEWFILE3=`echo "$NEWFILE2" | sed 's/-/_/g'`
#remove exclamation points
NEWFILE4=`echo "$NEWFILE3" | sed 's/!//g'`
#remove commas
NEWFILE5=`echo "$NEWFILE4" | sed 's/,//g'`
#remove single quotes
NEWFILE6=`echo "$NEWFILE5" | sed "s/'//g"`
#replace & with _and_
NEWFILE7=`echo "$NEWFILE6" | sed "s/&/_and_/g"`
#remove single quotes
NEWFILE8=`echo "$NEWFILE7" | sed "s/’//g"`
mv "$1" "$NEWFILE8"
}
for infile in *
do
if [ -d "$infile" ]; then
readdir "$infile"
else
renamer "$infile"
fi
done
ls
I'm trying to create a bash script to recurse through a directory and rename files, to remove spaces, dashes and other characters. I've gotten the script working fine for what I need, except for the recursive part of it. I'm still new to this, so it's not as efficient as it should be, but it works. Anyone know how to make this recursive?
#/bin/bash
for infile in *.*;
do
#replace " - " with a single underscore.
NEWFILE1=`echo $infile | sed 's/\s-\s/_/g'`;
#replace spaces with underscores
NEWFILE2=`echo $NEWFILE1 | sed 's/\s/_/g'`;
#replace "-" dashes with underscores.
NEWFILE3=`echo $NEWFILE2 | sed 's/-/_/g'`;
#remove exclamation points
NEWFILE4=`echo $NEWFILE3 | sed 's/!//g'`;
#remove commas
NEWFILE5=`echo $NEWFILE4 | sed 's/,//g'`;
mv "$infile" "$NEWFILE5";
done;
find is the command able to display all elements in a filesystem hierarchy. You can use it to execute a command on every found file or pipe the results to xargs which will handle the execution part.
Take care that for infile in *.* does not work on files containing whitespaces. Check the -print0 option of find, coupled to the -0 option of xargs.
All those semicolons are superfluous and there's no reason to use all those variables. If you want to put the sed commands on separate lines and intersperse detailed comments you can still do that.
#/bin/bash
find . | while read -r file
do
newfile=$(echo "$file" | sed '
#replace " - " with a single underscore.
s/\s-\s/_/g
#replace spaces with underscores
s/\s/_/g
#replace "-" dashes with underscores.
s/-/_/g
#remove exclamation points
s/!//g
#remove commas
s/,//g')
mv "$infile" "$newfile"
done
This is much shorter:
#/bin/bash
find . | while read -r file
do
# replace " - " or space or dash with underscores
# remove exclamation points and commas
newfile=$(echo "$file" | sed 's/\s-\s/_/g; s/\s/_/g; s/-/_/g; s/!//g; s/,//g')
mv "$infile" "$newfile"
done
Shorter still:
#/bin/bash
find . | while read -r file
do
# replace " - " or space or dash with underscores
# remove exclamation points and commas
newfile=$(echo "$file" | sed 's/\s-\s/_/g; s/[-\s]/_/g; s/[!,]//g')
mv "$infile" "$newfile"
done
In bash 4, setting the globstar option allows recursive globbing.
shopt -s globstar
for infile in **
...
Otherwise, use find.
while read infile
do
...
done < <(find ...)
or
find ... -exec ...
I've used 'find' in the past to locate files then had it execute another application.
See '-exec'
rename 's/pattern/replacement/' glob_pattern

Resources