Sed & Mac OS Terminal: How to remove parentheses content from the first line of every file? - macos

I am on Mac Os 10.14.6 and have a directory that contains subdirectories that all contain text files. Altogether, there are many hundreds of text files.
I would like to go through the text files and check for any content in the first line that is in parentheses. If such content is found, then the parentheses (and content in the parentheses) should be removed.
Example:
Before removal:
The new world (82 edition)
After removal:
The new world
How would I do this?
Steps I have tried:
Google around, it seems SED would be best for this.
I have found this thread, which provides SED code for removing bracketed content.
sed -e 's/([^()]*)//g'
However, I am not sure how to adapt it to work on multiple files and also to limit it to the first line of those files. I found this thread which explains how to use SED on multiple files, but I am not sure how to adapt the example to work with parentheses content.
Please note: As long as the solution works on Mac OS terminal, then it does not need to use SED. However, from Googling, SED seems to be the most suited.

I managed to achieve what you're after simply by using a bash script and sed together, as so:
#!/bin/bash
for filename in $PWD/*.txt; do
sed -i '' '1 s/([^()]*)//g' $filename
done
The script simply iterates over all the .txt files in $PWD (the current working directory, so that you can add this script to your bin and run it anywhere), and then runs the command
sed -ie '1 s/([^()]*)//g' $filename
on the file. By starting the command with the number 1 we tell sed to only work on the first line of the file :)
Edit: Best Answer
The above works fine in a directory where all contained objects are files, and not including directories; in other words, the above does not perform recursive search through directories.
Therefore, after some research, this command should perform exactly what the question asks:
find . -name "*.txt" -exec sed -i '' '1 s/([^()]*)//g' {} \;
I must iterate, and reiterate, that you test this on a backup first to test it works. Otherwise, use the same command as above but change the '' in order to control the creation of backups. For example,
find . -name "*.txt" -exec sed -i '.bkp' '1 s/([^()]*)//g' {} \;
This command will perform the sed replace in the original file (keeping the filename) but will create a backup file for each with the appended .bkp, for example test1.txt becomes test1.txt.bkp. This a safer option, but choose what works best for you :)

Good try,
The command you where looking for single line:
sed -E '1s|\([^\)]+\)||'
The command to replace each input file first line:
sed -Ei '1s|\([^\)]+\)||' *.txt
example:
echo "The new world (82 edition)" |sed -E '1s|\([^\)]+\)||'
The new world
Explanation
sed -Ei E option: the extended RegExp syntax, i option: for in-place file replacement
sed -Ei '1s|match RegExp||' for first line only, replace first matched RegExp string with empty string
\([^\)]+\) RegExp matching: start with (, [^\)]any char not ), + - more than once, terminate with )

Try:
# create a temporary file
tmp=$(mktemp)
# for each something in _the current directory_
for i in *; do
# if it is not a file, don't parse it
if [ ! -f "$i" ]; then continue; fi
# remove parenthesis on first line, save the output in temporary file
sed '1s/([^)]*)//g' "$i" > "$tmp"
# move temporary file to the original file
mv "$tmp" "$i"
done
# remove temporary file
rm "$tmp"

Related

bash change absolute path in file line by line for script creation

I'm trying to create a bash script based on a input file (list.txt). The input File contains a list of files with absolute path. The output should be a bash script (move.sh) which moves the files to another location, preserve the folder structure, but changing the target folder name slightly before.
the Input list.txt File example looks like this :
/In/Folder_1/SomeFoldername1/somefilename_x.mp3
/In/Folder_2/SomeFoldername2/somefilename_y.mp3
/In/Folder_3/SomeFoldername3/somefilename_z.mp3
The output file (move.sh) should looks like this after creation :
mv "/In/Folder_1/SomeFoldername1/somefilename_x.mp3" /gain/Folder_1/
mv "/In/Folder_2/SomeFoldername2/somefilename_y.mp3" /gain/Folder_2/
mv "/In/Folder_3/SomeFoldername3/somefilename_z.mp3" /gain/Folder_3/
The folder structure should be preserved, more or less.
after executing the created bash script (move.sh), the result should looks like this :
/gain/Folder_1/somefilename_x.mp3
/gain/Folder_2/somefilename_y.mp3
/gain/Folder_3/somefilename_z.mp3
What I've done so far.
1. create a list of files with absolute path
find /In/ -iname "*.mp3" -type f > /home/maars/mp3/list.txt
2. create the move.sh script
cp -a /home/maars/mp3/list.txt /home/maars/mp3/move.sh
# read the list and split the absolute path into fields
while IFS= read -r line;do
fields=($(printf "%s" "$line"|cut -d'/' --output-delimiter=' ' -f1-))
done < /home/maars/mp3/move.sh
# add the target path based on variables at the end of the line
sed -i -E "s|\.mp3|\.mp3"\"" /gain/"${fields[1]}"/|g" /home/maars/mp3/move.sh
sed -i "s|/In/|mv "\""/In/|g" /home/maars/mp3/move.sh
The script just use the value of ${fields[1]}, which is Folder_1 and put this in all lines at the end. Instead of Folder_2 and Folder_3.
The current result looks like
mv "/In/Folder_1/SomeFoldername1/somefilename_x.mp3" /gain/Folder_1/
mv "/In/Folder_2/SomeFoldername2/somefilename_y.mp3" /gain/Folder_1/
mv "/In/Folder_3/SomeFoldername3/somefilename_z.mp3" /gain/Folder_1/
rsync is not an option since I need the full control of files to be moved.
What could I do better to solve this issue ?
EDIT : #Socowi helped me a lot by pointing me in the right direction. After I did a deep dive into the World of Regex, I could solve my Issues. Thank you very much
The script just use the value of ${fields[1]}, which is Folder_1 and put this in all lines at the end. Instead of Folder_2 and Folder_3.
You iterate over all lines and update fields for every line. After you finished the loop, fields retains its value (from the last line). You would have to move the sed commands into your loop and make sure that only the current line is replaced by sed. However, there's a better way – see down below.
What could I do better
There are a lot of things you could improve, for instance
Creating the array fields with mapfile -d/ fields instead of printf+cut+($()). That way, you also wouldn't have problems with spaces in paths.
Use sed only once instead of creating the array fields and using multiple sed commands. You can replace step 2 with this small script:
cp -a /home/maars/mp3/list.txt /home/maars/mp3/move.sh
sed -i -E 's|^/[^/]*/([^/]*).*$|mv "&" "/gain/\1"|' /home/maars/mp3/move.sh
However, the best optimization would be to drop that three step approach and use only one script to find and move the files:
find /In/ -iname "*.mp3" -type f -exec rename -n 's|^/.*?/(.*?)/.*/(.*)$|/gain/$1/$2|' {} +
The -n option will print what will be renamed without actually renaming anything . Remove the -n when you are happy with the result. Here is the output:
rename(/In/Folder_1/SomeFoldername1/somefilename_x.mp3, /gain/Folder_1/somefilename_x.mp3)
rename(/In/Folder_2/SomeFoldername2/somefilename_y.mp3, /gain/Folder_2/somefilename_y.mp3)
rename(/In/Folder_3/SomeFoldername3/somefilename_z.mp3, /gain/Folder_3/somefilename_z.mp3)
It's not builtin to bash, but the mmv command is nice for this kind of mv where you need to use wildcards in paths. Something like the following should work:
mmv "in/*/*/*" "#1/#3"
Note that this won't create the directories for you - but in your example above it looks like these already exist?

Remove characters in all text files in a directory using sed

I have a lot of text files that are email templates. Many of them, for some reason, have the following line:
Best Regards,œ
That strange character at the end is what I am interested in removing from all of these files with a single command.
I tried:
for f in *
do
sed 's/"Best Regards,œ"/"Best Regards,"/g' $f | tee $f.t && mv $f.t $f
done
This ran through the process but did not actually remove the 'œ' character.
Please let me know what I am doing incorrectly so I can remove this character and maybe other non-alphanumeric characters using regex [:alnum:], perhaps.
I fixed the issue with removing the unwanted character with:
for f in *
do
sed 's/Best\ Regards\,\œ/Best\ Regards\,/g' $f | tee $f.t && mv $f.t $f
done
However, this still does not remove all of the non-alphanumeric characters from each line of each file. The other things I have tried either do not execute or remove the entire line.
I appreciate your help.
If ① you don't want to have to worry about Unicode, UTF-anything, LANG, etc, and ② you are confident that lines that start with the words "Best Regards," and ONLY those lines are the ones you want to affect, you can simply do this:
sed -i .bak '/^Best Regards,.*/s//Best Regards,/' *
Note that this processes all files in the current directory. If you want to do this in subdirectories, you could use find, with all its goodness. For example:
find /path/to/start/ -exec \
sed -i .bak '/^Best Regards,.*/s//Best Regards,/' {} \;
or if your shell is bash, you could use globstar:
shopt -s globstar
for f in **/*; do
sed -i .bak '/^Best Regards,.*/s//Best Regards,/' "$f"
done
Rather than using tee and mv, these solutions use sed's built-in "in-place" option, and creates a .bak file as a result. Consult the documentation for your implementation of sed to learn more about how to use the -i option -- it works a little differently with different seds.
This approach eliminates the need to search for that character in particular, so you won't need to worry about how it's being represented. Beware though, it will also eliminate any other text that follows the search string on the same line.
You don't need the loop. You can pass the results of the glob expression directly to sed and use the -i option for in place editing of files:
sed -i.bak 's/Best Regards,œ/Best Regards,/' *
-i.bak changes the input file in place and creates a backup file with the extension .bak.
Some implementations of sed, for example GNU sed even support -i without an argument other allow an empty string as argument for -i. In that case sed will not keep any backup files and simply change the original file.
With GNU sed:
sed -i 's/Best Regards,œ/Best Regards,/' *
# OR (BSD, MacOS)
sed -i '' 's/Best Regards,œ/Best Regards,/' *

Removing last n characters from Unix Filename before the extension

I have a bunch of files in Unix Directory :
test_XXXXX.txt
best_YYY.txt
nest_ZZZZZZZZZ.txt
I need to rename these files as
test.txt
best.txt
nest.txt
I am using Ksh on AIX .Please let me know how i can accomplish the above using a Single command .
Thanks,
In this case, it seems you have an _ to start every section you want to remove. If that's the case, then this ought to work:
for f in *.txt
do
g="${f%%_*}.txt"
echo mv "${f}" "${g}"
done
Remove the echo if the output seems correct, or replace the last line with done | ksh.
If the files aren't all .txt files, this is a little more general:
for f in *
do
ext="${f##*.}"
g="${f%%_*}.${ext}"
echo mv "${f}" "${g}"
done
If this is a one time (or not very often) occasion, I would create a script with
$ ls > rename.sh
$ vi rename.sh
:%s/\(.*\)/mv \1 \1/
(edit manually to remove all the XXXXX from the second file names)
:x
$ source rename.sh
If this need occurs frequently, I would need more insight into what XXXXX, YYY, and ZZZZZZZZZZZ are.
Addendum
Modify this to your liking:
ls | sed "{s/\(.*\)\(............\)\.txt$/mv \1\2.txt \1.txt/}" | sh
It transforms filenames by omitting 12 characters before .txt and passing the resulting mv command to a shell.
Beware: If there are non-matching filenames, it executes the filename—and not a mv command. I omitted a way to select only matching filenames.

Bash sed in loop

I'm trying to use the follow to substitute the tab with comma in several file:
#!/bin/sh
for i in *output_*.txt
do
sed 's/ /;/g' $i > $i
done
But it is not working because in the output file I still have the tab delimiter. It just work when I'm using it on a single file without the for loop.
Any help?
Thanks.
Several things are wrong. Unqouted variables and output redirection into same file. Loop is also not needed.
Try:
sed -i 's/ /;/g' *output_*.txt
The correct script you need is as follows:
find . -name '*output_*.txt' | while read FILENAME; do
(sed -e "s/\\t/;/g" <${FILENAME} >${FILENAME%.txt}.tmp) && (mv ${FILENAME%.txt}.tmp ${FILENAME});
done
This script has several important features:
It finds all files called *output_*.txt in the current directory and all subdirectories. If you do not want to recurse into subdirectories, then use:
find . -maxdepth 1 -name '*output_*.txt' | while read FILENAME; do
as the first line.
It does not overwrite your original input file if sed encounters an error. sed generates its output to a temporary file (<filename>.tmp) and it only replaces the original file if it is successful.
As pointed out by other posters, the tab character is represented by \t in sed scripts.
An example transformation performed by this script is as follows (the sequence <tab> represents a tab character):
Input:
<tab><tab><tab><tab><tab>line 1<tab><tab>
<tab><tab><tab>line 2<tab><tab>
<tab><tab>line 3<tab><tab>
<tab><tab><tab>line 4<tab><tab>
<tab><tab><tab><tab><tab>line<tab><tab> 5
Output:
;;;;line 1;;
;;;line 2;;
;;line 3;;
;;;line 4;;
;;;;;line;; 5

Using sed to mass rename files

Objective
Change these filenames:
F00001-0708-RG-biasliuyda
F00001-0708-CS-akgdlaul
F00001-0708-VF-hioulgigl
to these filenames:
F0001-0708-RG-biasliuyda
F0001-0708-CS-akgdlaul
F0001-0708-VF-hioulgigl
Shell Code
To test:
ls F00001-0708-*|sed 's/\(.\).\(.*\)/mv & \1\2/'
To perform:
ls F00001-0708-*|sed 's/\(.\).\(.*\)/mv & \1\2/' | sh
My Question
I don't understand the sed code. I understand what the substitution
command
$ sed 's/something/mv'
means. And I understand regular expressions somewhat. But I don't
understand what's happening here:
\(.\).\(.*\)
or here:
& \1\2/
The former, to me, just looks like it means: "a single character,
followed by a single character, followed by any length sequence of a
single character"--but surely there's more to it than that. As far as
the latter part:
& \1\2/
I have no idea.
First, I should say that the easiest way to do this is to use the
prename or rename commands.
On Ubuntu, OSX (Homebrew package rename, MacPorts package p5-file-rename), or other systems with perl rename (prename):
rename s/0000/000/ F0000*
or on systems with rename from util-linux-ng, such as RHEL:
rename 0000 000 F0000*
That's a lot more understandable than the equivalent sed command.
But as for understanding the sed command, the sed manpage is helpful. If
you run man sed and search for & (using the / command to search),
you'll find it's a special character in s/foo/bar/ replacements.
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success‐
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
Therefore, \(.\) matches the first character, which can be referenced by \1.
Then . matches the next character, which is always 0.
Then \(.*\) matches the rest of the filename, which can be referenced by \2.
The replacement string puts it all together using & (the original
filename) and \1\2 which is every part of the filename except the 2nd
character, which was a 0.
This is a pretty cryptic way to do this, IMHO. If for
some reason the rename command was not available and you wanted to use
sed to do the rename (or perhaps you were doing something too complex
for rename?), being more explicit in your regex would make it much
more readable. Perhaps something like:
ls F00001-0708-*|sed 's/F0000\(.*\)/mv & F000\1/' | sh
Being able to see what's actually changing in the
s/search/replacement/ makes it much more readable. Also it won't keep
sucking characters out of your filename if you accidentally run it
twice or something.
you've had your sed explanation, now you can use just the shell, no need external commands
for file in F0000*
do
echo mv "$file" "${file/#F0000/F000}"
# ${file/#F0000/F000} means replace the pattern that starts at beginning of string
done
I wrote a small post with examples on batch renaming using sed couple of years ago:
http://www.guyrutenberg.com/2009/01/12/batch-renaming-using-sed/
For example:
for i in *; do
mv "$i" "`echo $i | sed "s/regex/replace_text/"`";
done
If the regex contains groups (e.g. \(subregex\) then you can use them in the replacement text as \1\,\2 etc.
The easiest way would be:
for i in F00001*; do mv "$i" "${i/F00001/F0001}"; done
or, portably,
for i in F00001*; do mv "$i" "F0001${i#F00001}"; done
This replaces the F00001 prefix in the filenames with F0001.
credits to mahesh here: http://www.debian-administration.org/articles/150
The sed command
s/\(.\).\(.*\)/mv & \1\2/
means to replace:
\(.\).\(.*\)
with:
mv & \1\2
just like a regular sed command. However, the parentheses, & and \n markers change it a little.
The search string matches (and remembers as pattern 1) the single character at the start, followed by a single character, follwed by the rest of the string (remembered as pattern 2).
In the replacement string, you can refer to these matched patterns to use them as part of the replacement. You can also refer to the whole matched portion as &.
So what that sed command is doing is creating a mv command based on the original file (for the source) and character 1 and 3 onwards, effectively removing character 2 (for the destination). It will give you a series of lines along the following format:
mv F00001-0708-RG-biasliuyda F0001-0708-RG-biasliuyda
mv abcdef acdef
and so on.
Using perl rename (a must have in the toolbox):
rename -n 's/0000/000/' F0000*
Remove -n switch when the output looks good to rename for real.
There are other tools with the same name which may or may not be able to do this, so be careful.
The rename command that is part of the util-linux package, won't.
If you run the following command (GNU)
$ rename
and you see perlexpr, then this seems to be the right tool.
If not, to make it the default (usually already the case) on Debian and derivative like Ubuntu :
$ sudo apt install rename
$ sudo update-alternatives --set rename /usr/bin/file-rename
For archlinux:
pacman -S perl-rename
For RedHat-family distros:
yum install prename
The 'prename' package is in the EPEL repository.
For Gentoo:
emerge dev-perl/rename
For *BSD:
pkg install gprename
or p5-File-Rename
For Mac users:
brew install rename
If you don't have this command with another distro, search your package manager to install it or do it manually:
cpan -i File::Rename
Old standalone version can be found here
man rename
This tool was originally written by Larry Wall, the Perl's dad.
The backslash-paren stuff means, "while matching the pattern, hold on to the stuff that matches in here." Later, on the replacement text side, you can get those remembered fragments back with "\1" (first parenthesized block), "\2" (second block), and so on.
If all you're really doing is removing the second character, regardless of what it is, you can do this:
s/.//2
but your command is building a mv command and piping it to the shell for execution.
This is no more readable than your version:
find -type f | sed -n 'h;s/.//4;x;s/^/mv /;G;s/\n/ /g;p' | sh
The fourth character is removed because find is prepending each filename with "./".
Here's what I would do:
for file in *.[Jj][Pp][Gg] ;do
echo mv -vi \"$file\" `jhead $file|
grep Date|
cut -b 16-|
sed -e 's/:/-/g' -e 's/ /_/g' -e 's/$/.jpg/g'` ;
done
Then if that looks ok, add | sh to the end. So:
for file in *.[Jj][Pp][Gg] ;do
echo mv -vi \"$file\" `jhead $file|
grep Date|
cut -b 16-|
sed -e 's/:/-/g' -e 's/ /_/g' -e 's/$/.jpg/g'` ;
done | sh
for i in *; do mv $i $(echo $i|sed 's/AAA/BBB/'); done
The parentheses capture particular strings for use by the backslashed numbers.
ls F00001-0708-*|sed 's|^F0000\(.*\)|mv & F000\1|' | bash
Some examples that work for me:
$ tree -L 1 -F .
.
├── A.Show.2020.1400MB.txt
└── Some Show S01E01 the Loreming.txt
0 directories, 2 files
## remove "1400MB" (I: ignore case) ...
$ for f in *; do mv 2>/dev/null -v "$f" "`echo $f | sed -r 's/.[0-9]{1,}mb//I'`"; done;
renamed 'A.Show.2020.1400MB.txt' -> 'A.Show.2020.txt'
## change "S01E01 the" to "S01E01 The"
## \U& : change (here: regex-selected) text to uppercase;
## note also: no need here for `\1` in that regex expression
$ for f in *; do mv 2>/dev/null "$f" "`echo $f | sed -r "s/([0-9] [a-z])/\U&/"`"; done
$ tree -L 1 -F .
.
├── A.Show.2020.txt
└── Some Show S01E01 The Loreming.txt
0 directories, 2 files
$
2>/dev/null suppresses extraneous output (warnings ...)
reference [this thread]: https://stackoverflow.com/a/2372808/1904943
change case: https://www.networkworld.com/article/3529409/converting-between-uppercase-and-lowercase-on-the-linux-command-line.html

Resources