Remove string between pattern from file name using bash or sed - bash

I have a large set of .mp3 files which contain similar pattern of text attached to each file like this
'Yethamaiyaa Yetham HD Extreme Quality .-V7NEP5gnTTY.mp3'
where the actual track name is only
'Yethamaiyaa Yetham.mp3'
and this additional string
'HD Extreme Quality .-V7NEP5gnTTY' is attached to each file.
How do I remove this unnecessary string starting with HD and just before .mp3. The issue is that there is an additional dot . available between the marker strings. Also, the pattern of markers are same for all 400+ files. Any help to solve the issue is appriciated.

ls *.mp3 | sed -n "s/^\(.*\) HD .*/mv -- '&' '\1.mp3'/p" | bash
The above code uses sed to remove everything from " HD " to the end of the filename. The portion of the filename before " HD " is captured by the parens so it can be used later as \1. The entire line is replaced with the required mv command. I quoted it very carefully to account for the spaces in the filename.
If you want to see the commands it will perform without executing them, leave off the pipe to bash.
Preview commands:
ls *.mp3 | sed -n "s/^\(.*\) HD .*/mv -- '&' '\1.mp3'/p"

Related

UNIX change all the file extension for a list of files

I am a total beginner in this area so sorry if it is a dumb question.
In my shell script I have a variable named FILES, which holds the path to log files, like that:
FILES="./First.log ./Second.log logs/Third.log"
and I want to create a new variable with the same files but different extension, like that:
NEW_FILES="./First.txt ./Second.txt logs/Third.txt"
So I run this command:
NEW_FILES=$(echo "$FILES" | tr ".log" ".txt")
But I get this output:
NEW_FILES="./First.txt ./Secxnd.txt txts/Third.txt"
# ^^^
I understand the . character is a special character, but I don't know how I can escape it. I have already tried to add a \ before the period but to no avail.
tr replaces characters with other characters. When you write tr .log .txt it replaces . with ., l with t, o with x, and g with t.
To perform string replacement you can use sed 's/pattern/replacement/g', where s means substitute and g means globally (i.e., replace multiple times per line).
NEW_FILES=$(echo "$FILES" | sed 's/\.log/.txt/g')
You could also perform this replacement directly in the shell without any external tools.
NEW_FILES=${FILES//\.log/.txt}
The syntax is similar to sed, with a global replacement being indicated by two slashes. With a single slash only the first match would be replaced.
tr is not the tool you need. The goal of tr is to change characters on a 1-by-1 basis. You probably did not see it, but Second must have been changed to Secxnd.
I think sed is better.
NEW_FILES=$(sed 's/\.log/.txt/g' <<< $FILES)
It searches the \.log regular expression and replaces it with the .txt string. Please note the \. in the regex which means that it matches the dot character . and nothing else.

Pipe last 11 characters in filename to a text file

Have a directory full of file names that end with .mp3 and have a code in it that i would like to pipe into a text file.
I need to get the last 11 characters before the .mp3 part of a file in a certain directory and pipe that into a text file (with bash on mac osx)
How do I accomplish this? With sed?
If I'm understanding correctly, you have a list of files with names like "abcdefghijklmnopqrstuvwxyz.mp3" and want to extract "pqrstuvwxyz". You can do this directly in bash without invoking any fancy sed business:
for F in *.mp3; do STRIP=${F/.mp3}; echo ${STRIP: -11}; done > list.txt
The first STRIP variable is the name of each file F with the .mp3 extension removed. Then you echo the last 11 characters and save to a file.
There's a nice page on bash substitutions here. sed is great but I personally find it's overkill for these simple cases.
Along with good above answers, can be done via awk
for f in `ls *.mp3`;
echo $f|awk -F. '{printf substr($1,length($1)-11,length($1)),$2;}'
done

How can I remove hidden characters after a file extension in a variable

When I do
echo $filename
I get
Pew Pew.mp4
However,
echo "${#filename}"
Returns 19
How do I delete all characters after the file extension? It needs to work no matter what the file extension is because the file name in the variable will not always match *.mp4
You should try to find out why you have such strange files before fixing it.
Once you know, you can rename files.
When you just want to rename 1 file, just use the command
mv "Pew Pew.mp4"* "Pew Pew.mp4"
Cutting off the complete extension (with filename=${filename%%.*}) won't help you if you want to use the stripped extension (mp4 or jpg or ...).
EDIT:
I think OP want a work-around so I give another try.
When you have a a short list of extensions, you can try
for ext in mpeg mpg jpg avo mov; do
for filename in *.${ext}*; do
mv "${filename%%.*}.${ext}"* "${filename%%.*}.${ext}"
done
done
You can try strings to get the readable string.
echo "${filename}" | strings | wc
# Rename file
mv "${filename}" "$(echo "${filename}"| strings)"
EDIT:
strings gives more than 1 line as a result and unwanted spaces. Since Pew Pew has a space inside, I hope that all spaces, underscores and minus-signs are in front of the dot.
The newname can be constructed with something like
tmpname=$(echo "${filename}"| strings | head -1)
newname=${tmpname% *}
# or another way
newname=$(echo "${filename}"| sed 's/[[:alnum:]_- ]*\.[[:alnum:]]*\).*/\1/')
# or another (the best?) way (hoping that the first unwanted character is not a space)
newname="${filename%%[^[:alnum:]\.-_]*}"
# resulting in
mv "${filename}" "${filename%%[^[:alnum:]\.-_]*}"

Remove leading and trailing whitespace directories

The problem is that Mac OS X lets folders get names like " Foo Bar /".
This is what I could fix on my own and it works for most of the problems.
for d in */ ; do
echo "$d" | xargs
done
Result: "Foo Bar /".
The only problem it leaves that last space between the directory name and the slash.
Can someone help get rid of that last space?
Try this for a bash-only approach:
dir="some dir /"
fixed=${dir/ \///}
echo $fixed
some dir/
Bash string manipulation:
SUBSTRING REMOVAL BEGINNING OF STRING:
${string#substring} - remove shortest substring match from beginning
${string##substring} - remove longest substring match from beginning
SUBSTRING REMOVAL END OF STRING:
${string%substring} - remove shortest matching substring from tail
${string%%substring} - remove longest matching substring from tail
SUBSTRING REPLACEMENT:
${string/substring/replacement} - replace first match of substring
${string//substring/greplacement} - replace all matches of substring
${string/#substring/replacement} - replace substring at front of string
${string/%substring/replacement} - replace substring at tail of string
What about using sed like this:
name=" Foo Bar /"
echo "$name" | sed 's/^\s*\(.*\)\s*$/\1/g' | tr -s ' '
The sed expression removes all spaces before and after your name, while tr squeezes them to a maximum of one.
The problem is that Mac OS X lets folders get names like " Foo Bar /"
This is a Unix/BASH issue too. All characters except NUL and forward slash are valid characters in file names. This includes control characters like backspace and UTF8 characters.
You can use ${WORD%%filter} and ${WORD##filter} to help remove the extra spaces on the front and end of files. You might also want to substitute white space with an underscore character. Unix shell scripts usually work better if file names don't contain white space:
for file in *
do
new_file=$(tr -s " "<<<"$file")
mv "$file" "$new_file"
done
The tr -s " " uses the tr command to squeeze spaces out of the file name. This also squeezes beginning and ending spaces on Mac OS X too. This doesn't remove tab characters or NLs (although they could be added to the string.
The for file in * does assign the file names correctly to $file. The <<< is a Bashism to allow you to redirect a string as STDIN. This is good because tr only works with STDIN. The $(...) tells Bash to take the output of that command and to interpolate it into the command line.
This is a fairly simply script and may choke on other file names. I suggest you try it like this:
for file in *
do
new_file=$(tr -s " "<<<"$file")
echo "mv '$file' '$new_file' >> output.txt
done
Then, you can examine output.txt to verify that the mv command will work. Once you've verifies output.txt, you can use it as a bash script:
$ bash output.txt
That will then run the mv commands you've saved in output.txt.

How do I alter the n-th line in multiple files using SED?

I have a series of text files that I want to convert to markdown. I want to remove any leading spaces and add a hash sign to the first line in every file. If I run this:
sed -i.bak '1s/ *\(.*\)/\#\1/g' *.md
It alters the first line of the first file and processes them all, leaving the rest of the files unchanged.
What am I missing that will search and replace something on the n-th line of multiple files?
Using bash on OSX 10.7
The problem is that sed by default treats any number of files as a single stream, and thus line-number offsets are relative to the start of the first file.
For GNU sed, you can use the -s (--separate) flag to modify this behavior:
sed -s -i.bak '1s/^ */#/' *.md
...or, with non-GNU sed (including the one on Mac OS X), you can loop over the files and invoke once per each:
for f in *.md; do sed -i.bak '1s/^ */#/' "$f"; done
Note that the regex is a bit simplified here -- no need to match parts of the line that you aren't going to change.
XARgs will do the trick for you:
http://en.wikipedia.org/wiki/Xargs
Remove the *.md from the end of your sed command, then use XArgs to gather your files one at a time and send them to your sed command as a single entity, sorry I don't have time to work it out for you but the wikiPedia article should show you what you need to know.
sed -rsi.bak '1s/^/#/;s/^[ \t]+//' *.md
You don't need g(lobally) at the end of the command(s), because you wan't to replace something at the begin of line, and not multiple times.
You use two commands, one to modify line 1 (1s...), seperated from the second command for the leading blanks (and tabs? :=\t) with a semicolon. To remove blanks in the first line, switch the order:
sed -rsi.bak 's/^[ \t]+//;1s/^/#/' *.md
Remove the \t if you don't need it. Then you don't need a group either:
sed -rsi.bak 's/^ +//;1s/^/#/' *.md
-r is a flag to signal special treatment of regular expressions. You don't need to mask the plus in that case.

Resources